- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.StreamingOperator
-
- com.pervasive.datarush.operators.ExecutableOperator
-
- com.pervasive.datarush.operators.source.GenerateRandom
-
- All Implemented Interfaces:
LogicalOperator
,RecordSourceOperator
,SourceOperator<RecordPort>
public final class GenerateRandom extends ExecutableOperator implements RecordSourceOperator
Generates rows of random data.All field types except generic and object are supported.
The generated data for each field data type does not in general cover the full range of that type supported by the dataflow system, but does cover a range that any operator claiming to support that type should be able to handle.
- boolean: Either true or false
- binary: Between 1 and 2048 random bytes, with a uniform distribution of the number of bytes.
- char: ASCII characters 32-126 ("Valid Unicode" is not well defined)
- date: The range of days representable by a
Date
, +- 2^64 milliseconds from 1970-01-01, corresponding to +- about 292 million years. - double: The range of
Double
excluding NaN and +-Inf. (NaN values can be generated by setting nullFraction > 0.0) - float: The range of
Float
, excluding NaN and +-Inf. (NaN values can be generated by setting nullFraction > 0.0) - int: The full range of
Integer
. - long: The full range of a Java
Long
. - numeric: The integer part is formed from 1 to 100 binary digits, corresponding to up to 31 decimal digits (2^100 - 1 = 1,267,650,600,228,229,401,496,703,205,375). This value is then divided by 10^scale, where the scale is 0 to 29, and made negative with a probability of 50%.
- string: Zero or more random ASCII characters (see also char data type). The string length is unlimited, but the probability decays exponentially, as 0.9^min_length: 10% are empty strings, 9% are 1 character long, 8.1% are two characters long, 7.29% are 3 characters long, etc.
- timestamp: The range of seconds is the range representable by a
Timestamp
, +- 2^64 milliseconds from 1970-01-01, corresponding to +- about 292 million years. The nanos range is 0 to 999999999. The time zone offset is a whole number of minute in the range -12:00 to +12:59 (the extra hour is for daylight savings).
-
-
Constructor Summary
Constructors Constructor Description GenerateRandom()
The default constructor.GenerateRandom(RecordTokenType type, long rowCount)
Creates a new instance ofGenerateRandom
, specifying the minimal set of required parameters.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
computeMetadata(StreamingMetadataContext ctx)
Implementations must adhere to the following contractsprotected void
execute(ExecutionContext ctx)
Executes the operator.double
getNullProbability()
Gets the probability any given generated token will be null valued.RecordPort
getOutput()
Gets the record port providing the output data from the source.RecordTokenType
getOutputType()
Returns the data type of the generated valueslong
getRowCount()
Returns the number of values to generatelong
getSeed()
Gets the seed for the random number generator.void
setNullProbability(double nullProbability)
Sets the probability any given generated token will be null valued.void
setOutputType(RecordTokenType outputType)
Sets the data type of the generated valuesvoid
setRowCount(long rowCount)
Sets the number of values to generatevoid
setSeed(long seed)
Sets the seed for the random number generator.-
Methods inherited from class com.pervasive.datarush.operators.ExecutableOperator
cloneForExecution, getNumInputCopies, getPortSettings, handleInactiveOutput
-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Constructor Detail
-
GenerateRandom
public GenerateRandom()
The default constructor. Prior to graph compilation the following properties should be specified
-
GenerateRandom
public GenerateRandom(RecordTokenType type, long rowCount)
Creates a new instance ofGenerateRandom
, specifying the minimal set of required parameters.- Parameters:
type
- the output typerowCount
- the number of rows
-
-
Method Detail
-
getOutput
public RecordPort getOutput()
Description copied from interface:RecordSourceOperator
Gets the record port providing the output data from the source.- Specified by:
getOutput
in interfaceRecordSourceOperator
- Specified by:
getOutput
in interfaceSourceOperator<RecordPort>
- Returns:
- the output port for the source
-
getSeed
public long getSeed()
Gets the seed for the random number generator.- Returns:
- the seed for the random number generator
-
getNullProbability
public double getNullProbability()
Gets the probability any given generated token will be null valued.- Returns:
- the probability any given generated token will be null valued
-
setSeed
public void setSeed(long seed)
Sets the seed for the random number generator.- Parameters:
seed
- the seed for the random number generator
-
setNullProbability
public void setNullProbability(double nullProbability)
Sets the probability any given generated token will be null valued.- Parameters:
nullProbability
- the probability any given generated token will be null valued
-
getOutputType
public RecordTokenType getOutputType()
Returns the data type of the generated values- Returns:
- the data type of the generated values
-
setOutputType
public void setOutputType(RecordTokenType outputType)
Sets the data type of the generated values- Parameters:
outputType
- the data type of the generated values
-
getRowCount
public long getRowCount()
Returns the number of values to generate- Returns:
- the number of values to generate
-
setRowCount
public void setRowCount(long rowCount)
Sets the number of values to generate- Parameters:
rowCount
- the number of values to generate
-
computeMetadata
protected void computeMetadata(StreamingMetadataContext ctx)
Description copied from class:StreamingOperator
Implementations must adhere to the following contractsGeneral
Regardless of input ports/output port types, all implementations must do the following:- Validation. Validation of configuration should always be performed first.
- Declare parallelizability.. Implementations must declare parallelizability by calling
StreamingMetadataContext.parallelize(ParallelismStrategy)
.
Input record ports
Implementations with input record ports must declare the following:- Required data ordering: Implementations that have data ordering requirements must declare them by calling
- Required data distribution (only applies to parallelizable operators): Implementations that have data distribution requirements must declare them by calling
RecordPort#setRequiredDataOrdering
, otherwise data may arrive in any order.RecordPort#setRequiredDataDistribution
, otherwise data will arrive in anunspecified partial distribution
.RecordPort#getSourceDataDistribution
andRecordPort#getSourceDataOrdering
. These should be viewed as a hints to help chose a more efficient algorithm. In such cases, though, operators must still declare data ordering and data distribution requirements; otherwise there is no guarantee that data will arrive sorted/distributed as required.Output record ports
Implementations with output record ports must declare the following:- Type: Implementations must declare their output type by calling
RecordPort#setType
.
- Output data ordering: Implementations that can make guarantees as to their output
ordering may do so by calling
RecordPort#setOutputDataOrdering
- Output data distribution (only applies to parallelizable operators): Implementations that can make guarantees as to their output
distribution may do so by calling
RecordPort#setOutputDataDistribution
Input model ports
In general, there is nothing special to declare for input model ports. Models are implicitly duplicated to all partitions when going from non-parallel to parallel operators. The case of a model going from a parallel to a non-parallel node is a special case of a "model reducer" operator. In the case of a model reducer, the downstream operator, must declare the following:- Merge handler: Model reducers must declare a merge handler by
calling
AbstractModelPort#setMergeHandler
.
MergeModel
is a convenient, re-usable model reducer, parameterized with a merge-handler.Output model ports
SimpleModelPort
's have no associated metadata and therefore there is never any output metadata to declare.PMMLPort
's, on the other hand, do have associated metadata. For all PMMLPorts, implementations must declare the following:- pmmlModelSpec: Implementations must declare the PMML model spec
by calling
PMMLPort.setPMMLModelSpec
.
- Specified by:
computeMetadata
in classStreamingOperator
- Parameters:
ctx
- the context
-
execute
protected void execute(ExecutionContext ctx)
Description copied from class:ExecutableOperator
Executes the operator. Implementations should adhere to the following contracts:- Following execution, all input ports must be at end-of-data.
- Following execution, all output ports must be at end-of-data.
- Specified by:
execute
in classExecutableOperator
- Parameters:
ctx
- context in which to lookup physical ports bound to logical ports
-
-