- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.StreamingOperator
-
- com.pervasive.datarush.operators.DeferredCompositeOperator
-
- com.pervasive.datarush.operators.join.UnionAll
-
- All Implemented Interfaces:
LogicalOperator
,SourceOperator<RecordPort>
public class UnionAll extends DeferredCompositeOperator implements SourceOperator<RecordPort>
Provides a union of two data sources. The input data is usually consumed as it becomes available. If the data order specified in the metadata of both inputs match, then the operator will perform a sorted merge preserving the sort order of the input. Otherwise the output will be ordered non-deterministically.The type of the output is determined by setting the union mode. If the outputMapping setting is set to MAPBYSCHEMA then a schema must be provided which will define the output type. Otherwise the output type can be automatically determined by setting MAPBYPOSITION or MAPBYNAME, which will determine an appropriate output type by mapping the two inputs by position or name respectively.
In the case where a target schema is provided the input fields are mapped to output fields based on field name. If a field in the output is not contained in the target schema, the field is dropped. If a field is contained in the output schema, but not in the input, the output field will contain NULL values. Input values are converted into the specified output field type where possible.
For example, if the left input contains fields {a:int, b:int, c:string} and the right input contains fields {a:long, b:double, d:string} and the target schema specifies fields {a:double, c:string, d:date, e:string} then the following is true:
- The output port schema will contain the fields {a:double, c:string, d:date, e:string}.
- Field b is dropped from both inputs.
- When obtaining data from the left input port, the output fields {d, e} will contain null values.
- When obtaining data from the right input port, the output fields {c, e} will contain null values.
- Field a from the left input will be converted from an
integer
type to adouble
type. - Field a from the right input will be converted from a
long
type to adouble
type. - Field d from the right input will be converted from a
string
type to adate
type. The format pattern specified in the target schema will be used for the conversion, if specified.
In the case where a target schema is not provided, the two input ports must have compatible types. In this case the operator will try to determine valid output fields based on the left and right input and two settings. The outputMapping setting can be set to MAPBYNAME if the left and right side should be matched by field name. Otherwise the operator can use MAPBYPOSITION and they will be matched by position. Additionally the allowExtraFields setting can be set to true if fields that are only present on one side of the input should be retained. If the field is not present in one of the inputs it will contain NULL values. If extra fields are present in one of the inputs and allowExtraFields is set to false an error will be thrown.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
UnionAll.UnionMode
-
Constructor Summary
Constructors Constructor Description UnionAll()
UnionAll(RecordTextSchema<?> targetSchema)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
compose(DeferredCompositionContext ctx)
Compose the body of this operator.protected void
computeMetadata(StreamingMetadataContext ctx)
This operator can execute in parallel.static RecordTokenType
generateSchema(RecordTokenType leftType, RecordTokenType rightType, UnionAll.UnionMode mode, boolean keepExtraFields)
Generate a schema for the union of two records.boolean
getIncludeExtraFields()
Will be true if the generated schema will include unmapped fields from either side.RecordPort
getLeft()
Returns the left input port.RecordPort
getOutput()
Returns the output port.UnionAll.UnionMode
getOutputMapping()
Get how the output type should be determined.RecordPort
getRight()
Returns the right input port.RecordTextSchema<?>
getSchema()
Gets the target record schema defining the output type.void
setIncludeExtraFields(boolean includeExtraFields)
Set to true if the generated schema should include unmapped field from either side.void
setOutputMapping(UnionAll.UnionMode outputMapping)
Set how the output type should be determined.void
setSchema(RecordTextSchema<?> schema)
Sets the optional target output schema.-
Methods inherited from class com.pervasive.datarush.operators.DeferredCompositeOperator
computeOutputTypes
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Constructor Detail
-
UnionAll
public UnionAll()
-
UnionAll
public UnionAll(RecordTextSchema<?> targetSchema)
-
-
Method Detail
-
getLeft
public final RecordPort getLeft()
Returns the left input port.- Returns:
- the left input port.
-
getRight
public final RecordPort getRight()
Returns the right input port.- Returns:
- the right input port.
-
getOutput
public final RecordPort getOutput()
Returns the output port.- Specified by:
getOutput
in interfaceSourceOperator<RecordPort>
- Returns:
- the output port.
-
getSchema
public RecordTextSchema<?> getSchema()
Gets the target record schema defining the output type.- Returns:
- the record schema defining the output type
-
setSchema
public void setSchema(RecordTextSchema<?> schema)
Sets the optional target output schema. The input data sources will be coerced into this type.- Parameters:
schema
- the record schema defining the output type
-
getOutputMapping
public UnionAll.UnionMode getOutputMapping()
Get how the output type should be determined. Can be set to MAPBYPOSITION, MAPBYNAME, or MAPBYSCHEMA. If the first two options are used the output type will be automatically determined by mapping the fields in the two inputs positionally or by name respectively. Otherwise if MAPBYSCHEMA is used the schema must be set. Defaults to MAPBYPOSITION.- Returns:
- the union mapping mode
-
setOutputMapping
public void setOutputMapping(UnionAll.UnionMode outputMapping)
Set how the output type should be determined. Can be set to MAPBYPOSITION, MAPBYNAME, or MAPBYSCHEMA. If the first two options are used the output type will be automatically determined by mapping the fields in the two inputs positionally or by name respectively. Otherwise if MAPBYSCHEMA is used the schema must be set. Defaults to MAPBYPOSITION.- Parameters:
outputMapping
- the union mapping mode
-
getIncludeExtraFields
public boolean getIncludeExtraFields()
Will be true if the generated schema will include unmapped fields from either side. Only applies if MAPBYPOSITION or MAPBYNAME mode are used. Defaults to false.- Returns:
- true if unmapped fields will be included in the generated schema
-
setIncludeExtraFields
public void setIncludeExtraFields(boolean includeExtraFields)
Set to true if the generated schema should include unmapped field from either side. Only applies if MAPBYPOSITION or MAPBYNAME mode are used. Defaults to false.- Parameters:
includeExtraFields
- if unmapped fields will be included in the generated schema
-
computeMetadata
protected void computeMetadata(StreamingMetadataContext ctx)
This operator can execute in parallel. It does not guarantee order so order meta-data for the output is not set and resolves to the default. Also the distribution cannot be maintained as a change in data types for fields used for distribution invalidates any current settings.- Specified by:
computeMetadata
in classDeferredCompositeOperator
- Parameters:
ctx
- the context
-
generateSchema
public static RecordTokenType generateSchema(RecordTokenType leftType, RecordTokenType rightType, UnionAll.UnionMode mode, boolean keepExtraFields)
Generate a schema for the union of two records.- Parameters:
leftType
- the left record typerightType
- the right record typemode
- the union modekeepExtraFields
- if true fields only present on one side of the input will be retained- Returns:
- the combined record type
-
compose
protected void compose(DeferredCompositionContext ctx)
Description copied from class:DeferredCompositeOperator
Compose the body of this operator. Implementations should do the following:- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O)
- Create necessary connections via the method
OperatorComposable.connect(P, P)
. This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
compose
in classDeferredCompositeOperator
- Parameters:
ctx
- the context
- Instantiate and configure sub-operators, adding them to the provided context via
the method
-
-