- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.StreamingOperator
-
- com.pervasive.datarush.operators.DeferredCompositeOperator
-
- com.pervasive.datarush.operators.join.UnionAll
-
- All Implemented Interfaces:
LogicalOperator,SourceOperator<RecordPort>
public class UnionAll extends DeferredCompositeOperator implements SourceOperator<RecordPort>
Provides a union of two data sources. The input data is usually consumed as it becomes available. If the data order specified in the metadata of both inputs match, then the operator will perform a sorted merge preserving the sort order of the input. Otherwise the output will be ordered non-deterministically.The type of the output is determined by setting the union mode. If the outputMapping setting is set to MAPBYSCHEMA then a schema must be provided which will define the output type. Otherwise the output type can be automatically determined by setting MAPBYPOSITION or MAPBYNAME, which will determine an appropriate output type by mapping the two inputs by position or name respectively.
In the case where a target schema is provided the input fields are mapped to output fields based on field name. If a field in the output is not contained in the target schema, the field is dropped. If a field is contained in the output schema, but not in the input, the output field will contain NULL values. Input values are converted into the specified output field type where possible.
For example, if the left input contains fields {a:int, b:int, c:string} and the right input contains fields {a:long, b:double, d:string} and the target schema specifies fields {a:double, c:string, d:date, e:string} then the following is true:
- The output port schema will contain the fields {a:double, c:string, d:date, e:string}.
- Field b is dropped from both inputs.
- When obtaining data from the left input port, the output fields {d, e} will contain null values.
- When obtaining data from the right input port, the output fields {c, e} will contain null values.
- Field a from the left input will be converted from an
integertype to adoubletype. - Field a from the right input will be converted from a
longtype to adoubletype. - Field d from the right input will be converted from a
stringtype to adatetype. The format pattern specified in the target schema will be used for the conversion, if specified.
In the case where a target schema is not provided, the two input ports must have compatible types. In this case the operator will try to determine valid output fields based on the left and right input and two settings. The outputMapping setting can be set to MAPBYNAME if the left and right side should be matched by field name. Otherwise the operator can use MAPBYPOSITION and they will be matched by position. Additionally the allowExtraFields setting can be set to true if fields that are only present on one side of the input should be retained. If the field is not present in one of the inputs it will contain NULL values. If extra fields are present in one of the inputs and allowExtraFields is set to false an error will be thrown.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classUnionAll.UnionMode
-
Constructor Summary
Constructors Constructor Description UnionAll()UnionAll(RecordTextSchema<?> targetSchema)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidcompose(DeferredCompositionContext ctx)Compose the body of this operator.protected voidcomputeMetadata(StreamingMetadataContext ctx)This operator can execute in parallel.static RecordTokenTypegenerateSchema(RecordTokenType leftType, RecordTokenType rightType, UnionAll.UnionMode mode, boolean keepExtraFields)Generate a schema for the union of two records.booleangetIncludeExtraFields()Will be true if the generated schema will include unmapped fields from either side.RecordPortgetLeft()Returns the left input port.RecordPortgetOutput()Returns the output port.UnionAll.UnionModegetOutputMapping()Get how the output type should be determined.RecordPortgetRight()Returns the right input port.RecordTextSchema<?>getSchema()Gets the target record schema defining the output type.voidsetIncludeExtraFields(boolean includeExtraFields)Set to true if the generated schema should include unmapped field from either side.voidsetOutputMapping(UnionAll.UnionMode outputMapping)Set how the output type should be determined.voidsetSchema(RecordTextSchema<?> schema)Sets the optional target output schema.-
Methods inherited from class com.pervasive.datarush.operators.DeferredCompositeOperator
computeOutputTypes
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Constructor Detail
-
UnionAll
public UnionAll()
-
UnionAll
public UnionAll(RecordTextSchema<?> targetSchema)
-
-
Method Detail
-
getLeft
public final RecordPort getLeft()
Returns the left input port.- Returns:
- the left input port.
-
getRight
public final RecordPort getRight()
Returns the right input port.- Returns:
- the right input port.
-
getOutput
public final RecordPort getOutput()
Returns the output port.- Specified by:
getOutputin interfaceSourceOperator<RecordPort>- Returns:
- the output port.
-
getSchema
public RecordTextSchema<?> getSchema()
Gets the target record schema defining the output type.- Returns:
- the record schema defining the output type
-
setSchema
public void setSchema(RecordTextSchema<?> schema)
Sets the optional target output schema. The input data sources will be coerced into this type.- Parameters:
schema- the record schema defining the output type
-
getOutputMapping
public UnionAll.UnionMode getOutputMapping()
Get how the output type should be determined. Can be set to MAPBYPOSITION, MAPBYNAME, or MAPBYSCHEMA. If the first two options are used the output type will be automatically determined by mapping the fields in the two inputs positionally or by name respectively. Otherwise if MAPBYSCHEMA is used the schema must be set. Defaults to MAPBYPOSITION.- Returns:
- the union mapping mode
-
setOutputMapping
public void setOutputMapping(UnionAll.UnionMode outputMapping)
Set how the output type should be determined. Can be set to MAPBYPOSITION, MAPBYNAME, or MAPBYSCHEMA. If the first two options are used the output type will be automatically determined by mapping the fields in the two inputs positionally or by name respectively. Otherwise if MAPBYSCHEMA is used the schema must be set. Defaults to MAPBYPOSITION.- Parameters:
outputMapping- the union mapping mode
-
getIncludeExtraFields
public boolean getIncludeExtraFields()
Will be true if the generated schema will include unmapped fields from either side. Only applies if MAPBYPOSITION or MAPBYNAME mode are used. Defaults to false.- Returns:
- true if unmapped fields will be included in the generated schema
-
setIncludeExtraFields
public void setIncludeExtraFields(boolean includeExtraFields)
Set to true if the generated schema should include unmapped field from either side. Only applies if MAPBYPOSITION or MAPBYNAME mode are used. Defaults to false.- Parameters:
includeExtraFields- if unmapped fields will be included in the generated schema
-
computeMetadata
protected void computeMetadata(StreamingMetadataContext ctx)
This operator can execute in parallel. It does not guarantee order so order meta-data for the output is not set and resolves to the default. Also the distribution cannot be maintained as a change in data types for fields used for distribution invalidates any current settings.- Specified by:
computeMetadatain classDeferredCompositeOperator- Parameters:
ctx- the context
-
generateSchema
public static RecordTokenType generateSchema(RecordTokenType leftType, RecordTokenType rightType, UnionAll.UnionMode mode, boolean keepExtraFields)
Generate a schema for the union of two records.- Parameters:
leftType- the left record typerightType- the right record typemode- the union modekeepExtraFields- if true fields only present on one side of the input will be retained- Returns:
- the combined record type
-
compose
protected void compose(DeferredCompositionContext ctx)
Description copied from class:DeferredCompositeOperatorCompose the body of this operator. Implementations should do the following:- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O) - Create necessary connections via the method
OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
composein classDeferredCompositeOperator- Parameters:
ctx- the context
- Instantiate and configure sub-operators, adding them to the provided context via
the method
-
-