- All Implemented Interfaces:
LogicalOperator,SourceOperator<RecordPort>
The type of the output is determined by setting the union mode. If the outputMapping setting is set to MAPBYSCHEMA then a schema must be provided which will define the output type. Otherwise the output type can be automatically determined by setting MAPBYPOSITION or MAPBYNAME, which will determine an appropriate output type by mapping the two inputs by position or name respectively.
In the case where a target schema is provided the input fields are mapped to output fields based on field name. If a field in the output is not contained in the target schema, the field is dropped. If a field is contained in the output schema, but not in the input, the output field will contain NULL values. Input values are converted into the specified output field type where possible.
For example, if the left input contains fields {a:int, b:int, c:string} and the right input contains fields {a:long, b:double, d:string} and the target schema specifies fields {a:double, c:string, d:date, e:string} then the following is true:
- The output port schema will contain the fields {a:double, c:string, d:date, e:string}.
- Field b is dropped from both inputs.
- When obtaining data from the left input port, the output fields {d, e} will contain null values.
- When obtaining data from the right input port, the output fields {c, e} will contain null values.
- Field a from the left input will be converted from an
integertype to adoubletype. - Field a from the right input will be converted from a
longtype to adoubletype. - Field d from the right input will be converted from a
stringtype to adatetype. The format pattern specified in the target schema will be used for the conversion, if specified.
In the case where a target schema is not provided, the two input ports must have compatible types. In this case the operator will try to determine valid output fields based on the left and right input and two settings. The outputMapping setting can be set to MAPBYNAME if the left and right side should be matched by field name. Otherwise the operator can use MAPBYPOSITION and they will be matched by position. Additionally the allowExtraFields setting can be set to true if fields that are only present on one side of the input should be retained. If the field is not present in one of the inputs it will contain NULL values. If extra fields are present in one of the inputs and allowExtraFields is set to false an error will be thrown.
-
Nested Class Summary
Nested Classes -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidCompose the body of this operator.protected voidThis operator can execute in parallel.static RecordTokenTypegenerateSchema(RecordTokenType leftType, RecordTokenType rightType, UnionAll.UnionMode mode, boolean keepExtraFields) Generate a schema for the union of two records.booleanWill be true if the generated schema will include unmapped fields from either side.final RecordPortgetLeft()Returns the left input port.final RecordPortReturns the output port.Get how the output type should be determined.final RecordPortgetRight()Returns the right input port.Gets the target record schema defining the output type.voidsetIncludeExtraFields(boolean includeExtraFields) Set to true if the generated schema should include unmapped field from either side.voidsetOutputMapping(UnionAll.UnionMode outputMapping) Set how the output type should be determined.voidsetSchema(RecordTextSchema<?> schema) Sets the optional target output schema.Methods inherited from class com.pervasive.datarush.operators.DeferredCompositeOperator
computeOutputTypesMethods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyErrorMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
Constructor Details
-
UnionAll
public UnionAll() -
UnionAll
-
-
Method Details
-
getLeft
Returns the left input port.- Returns:
- the left input port.
-
getRight
Returns the right input port.- Returns:
- the right input port.
-
getOutput
Returns the output port.- Specified by:
getOutputin interfaceSourceOperator<RecordPort>- Returns:
- the output port.
-
getSchema
Gets the target record schema defining the output type.- Returns:
- the record schema defining the output type
-
setSchema
Sets the optional target output schema. The input data sources will be coerced into this type.- Parameters:
schema- the record schema defining the output type
-
getOutputMapping
Get how the output type should be determined. Can be set to MAPBYPOSITION, MAPBYNAME, or MAPBYSCHEMA. If the first two options are used the output type will be automatically determined by mapping the fields in the two inputs positionally or by name respectively. Otherwise if MAPBYSCHEMA is used the schema must be set. Defaults to MAPBYPOSITION.- Returns:
- the union mapping mode
-
setOutputMapping
Set how the output type should be determined. Can be set to MAPBYPOSITION, MAPBYNAME, or MAPBYSCHEMA. If the first two options are used the output type will be automatically determined by mapping the fields in the two inputs positionally or by name respectively. Otherwise if MAPBYSCHEMA is used the schema must be set. Defaults to MAPBYPOSITION.- Parameters:
outputMapping- the union mapping mode
-
getIncludeExtraFields
public boolean getIncludeExtraFields()Will be true if the generated schema will include unmapped fields from either side. Only applies if MAPBYPOSITION or MAPBYNAME mode are used. Defaults to false.- Returns:
- true if unmapped fields will be included in the generated schema
-
setIncludeExtraFields
public void setIncludeExtraFields(boolean includeExtraFields) Set to true if the generated schema should include unmapped field from either side. Only applies if MAPBYPOSITION or MAPBYNAME mode are used. Defaults to false.- Parameters:
includeExtraFields- if unmapped fields will be included in the generated schema
-
computeMetadata
This operator can execute in parallel. It does not guarantee order so order meta-data for the output is not set and resolves to the default. Also the distribution cannot be maintained as a change in data types for fields used for distribution invalidates any current settings.- Specified by:
computeMetadatain classDeferredCompositeOperator- Parameters:
ctx- the context
-
generateSchema
public static RecordTokenType generateSchema(RecordTokenType leftType, RecordTokenType rightType, UnionAll.UnionMode mode, boolean keepExtraFields) Generate a schema for the union of two records.- Parameters:
leftType- the left record typerightType- the right record typemode- the union modekeepExtraFields- if true fields only present on one side of the input will be retained- Returns:
- the combined record type
-
compose
Description copied from class:DeferredCompositeOperatorCompose the body of this operator. Implementations should do the following:- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O) - Create necessary connections via the method
OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
composein classDeferredCompositeOperator- Parameters:
ctx- the context
- Instantiate and configure sub-operators, adding them to the provided context via
the method
-