- All Implemented Interfaces:
LogicalOperator,PipelineOperator<RecordPort>,RecordPipelineOperator
percent or the sampleSize property.
The sampling can be executed in one of two modes:
- BY_PERCENT: the specified percentage of rows will be output
- BY_SIZE: the rows output depend on the given sample size and the total number of rows of the input data
For example, using BY_PERCENT mode with 10000 input rows and percent set to 0.25, you can
expect approximately 2500 rows of output. This value is not exact. It will vary with different
settings of the seed property.
In contrast, using BY_SIZE mode with any input data size and sampleSize set to 2500, you can
expect approximately 2500 rows of output. This value is not exact. It will vary with different
settings of the seed property. Use BY_SIZE when you want to
have a specific number of rows output. The sampleSize property sets an upper limit on the
number of rows that will be output.
The seed property is set to the current time (System.currentTimeMillis()
by default. Override this value to specify the random seed to use.
-
Field Summary
Fields inherited from class com.pervasive.datarush.operators.AbstractRecordCompositeOperator
input, output -
Constructor Summary
ConstructorsConstructorDescriptionPerforms default random sampling on the data.SampleRandomRows(double percent, long seed) Perform random sampling selecting a fixed percentage of the input.SampleRandomRows(long sampleSize, long seed) Perform random sampling selecting a fixed number of records from the input data. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidCompose the body of this operator.getInput()Returns the input portgetMode()Get the sample mode.Returns the output portdoubleGet the percentage of input data to output.longGet the sample size (in rows) of data wanted.getSeed()Get the random number generator seed value.voidsetMode(SampleMode mode) Set the sample mode.voidsetPercent(double percent) Set the percentage of input data wanted.voidsetSampleSize(long sampleSize) Set the wanted sample size in rows.voidSet the random number generator seed value.Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyErrorMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
Constructor Details
-
SampleRandomRows
public SampleRandomRows()Performs default random sampling on the data. By default, sampling will select a fixed percentage of the input. -
SampleRandomRows
public SampleRandomRows(double percent, long seed) Perform random sampling selecting a fixed percentage of the input.- Parameters:
percent- percentage of the input data wantedseed- seed value for the random number generator
-
SampleRandomRows
public SampleRandomRows(long sampleSize, long seed) Perform random sampling selecting a fixed number of records from the input data.- Parameters:
sampleSize- the wanted output sample size (in rows)seed- seed value for the random number generator
-
-
Method Details
-
getInput
Description copied from interface:PipelineOperatorReturns the input port- Specified by:
getInputin interfacePipelineOperator<RecordPort>- Overrides:
getInputin classAbstractRecordCompositeOperator- Returns:
- the input port
-
getOutput
Description copied from interface:PipelineOperatorReturns the output port- Specified by:
getOutputin interfacePipelineOperator<RecordPort>- Overrides:
getOutputin classAbstractRecordCompositeOperator- Returns:
- the output port
-
getSeed
Get the random number generator seed value.- Returns:
- random number generator seed value
-
setSeed
Set the random number generator seed value.- Parameters:
seed- random number generator seed value
-
getPercent
public double getPercent()Get the percentage of input data to output.- Returns:
- percentage of input data
-
setPercent
public void setPercent(double percent) Set the percentage of input data wanted. This value must be in the range: 0 < seed < 1.0. This value is only used of the sample mode isBY_PERCENT.- Parameters:
percent- percentage of input data
-
getSampleSize
public long getSampleSize()Get the sample size (in rows) of data wanted.- Returns:
- sample size
-
setSampleSize
public void setSampleSize(long sampleSize) Set the wanted sample size in rows. Set this value when using theBY_SIZEsample mode. The operator will output approximately the sample size number of rows.- Parameters:
sampleSize- wanted sample size
-
getMode
Get the sample mode.- Returns:
- sample mode
-
setMode
Set the sample mode.- Parameters:
mode- sample mode
-
compose
Description copied from class:CompositeOperatorCompose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O) - Create necessary connections via the method
OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
composein classCompositeOperator- Parameters:
ctx- the context
-