- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.AbstractRecordCompositeOperator
-
- com.pervasive.datarush.operators.select.SampleRandomRows
-
- All Implemented Interfaces:
LogicalOperator,PipelineOperator<RecordPort>,RecordPipelineOperator
public class SampleRandomRows extends AbstractRecordCompositeOperator
Apply random sampling to the input data. The schema of the output data matches that of the input. The output data usually contains fewer rows than the input. The number of rows output varies depending on the value of thepercentor thesampleSizeproperty.The sampling can be executed in one of two modes:
- BY_PERCENT: the specified percentage of rows will be output
- BY_SIZE: the rows output depend on the given sample size and the total number of rows of the input data
For example, using
BY_PERCENTmode with 10000 input rows andpercentset to 0.25, you can expect approximately 2500 rows of output. This value is not exact. It will vary with different settings of theseedproperty.In contrast, using
BY_SIZEmode with any input data size andsampleSizeset to 2500, you can expect approximately 2500 rows of output. This value is not exact. It will vary with different settings of theseedproperty. UseBY_SIZEwhen you want to have a specific number of rows output. ThesampleSizeproperty sets an upper limit on the number of rows that will be output.The
seedproperty is set to the current time (System.currentTimeMillis()by default. Override this value to specify the random seed to use.
-
-
Field Summary
-
Fields inherited from class com.pervasive.datarush.operators.AbstractRecordCompositeOperator
input, output
-
-
Constructor Summary
Constructors Constructor Description SampleRandomRows()Performs default random sampling on the data.SampleRandomRows(double percent, long seed)Perform random sampling selecting a fixed percentage of the input.SampleRandomRows(long sampleSize, long seed)Perform random sampling selecting a fixed number of records from the input data.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidcompose(CompositionContext ctx)Compose the body of this operator.RecordPortgetInput()Returns the input portSampleModegetMode()Get the sample mode.RecordPortgetOutput()Returns the output portdoublegetPercent()Get the percentage of input data to output.longgetSampleSize()Get the sample size (in rows) of data wanted.LonggetSeed()Get the random number generator seed value.voidsetMode(SampleMode mode)Set the sample mode.voidsetPercent(double percent)Set the percentage of input data wanted.voidsetSampleSize(long sampleSize)Set the wanted sample size in rows.voidsetSeed(Long seed)Set the random number generator seed value.-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Constructor Detail
-
SampleRandomRows
public SampleRandomRows()
Performs default random sampling on the data. By default, sampling will select a fixed percentage of the input.
-
SampleRandomRows
public SampleRandomRows(double percent, long seed)Perform random sampling selecting a fixed percentage of the input.- Parameters:
percent- percentage of the input data wantedseed- seed value for the random number generator
-
SampleRandomRows
public SampleRandomRows(long sampleSize, long seed)Perform random sampling selecting a fixed number of records from the input data.- Parameters:
sampleSize- the wanted output sample size (in rows)seed- seed value for the random number generator
-
-
Method Detail
-
getInput
public RecordPort getInput()
Description copied from interface:PipelineOperatorReturns the input port- Specified by:
getInputin interfacePipelineOperator<RecordPort>- Overrides:
getInputin classAbstractRecordCompositeOperator- Returns:
- the input port
-
getOutput
public RecordPort getOutput()
Description copied from interface:PipelineOperatorReturns the output port- Specified by:
getOutputin interfacePipelineOperator<RecordPort>- Overrides:
getOutputin classAbstractRecordCompositeOperator- Returns:
- the output port
-
getSeed
public Long getSeed()
Get the random number generator seed value.- Returns:
- random number generator seed value
-
setSeed
public void setSeed(Long seed)
Set the random number generator seed value.- Parameters:
seed- random number generator seed value
-
getPercent
public double getPercent()
Get the percentage of input data to output.- Returns:
- percentage of input data
-
setPercent
public void setPercent(double percent)
Set the percentage of input data wanted. This value must be in the range: 0 < seed < 1.0. This value is only used of the sample mode isBY_PERCENT.- Parameters:
percent- percentage of input data
-
getSampleSize
public long getSampleSize()
Get the sample size (in rows) of data wanted.- Returns:
- sample size
-
setSampleSize
public void setSampleSize(long sampleSize)
Set the wanted sample size in rows. Set this value when using theBY_SIZEsample mode. The operator will output approximately the sample size number of rows.- Parameters:
sampleSize- wanted sample size
-
getMode
public SampleMode getMode()
Get the sample mode.- Returns:
- sample mode
-
setMode
public void setMode(SampleMode mode)
Set the sample mode.- Parameters:
mode- sample mode
-
compose
protected void compose(CompositionContext ctx)
Description copied from class:CompositeOperatorCompose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O) - Create necessary connections via the method
OperatorComposable.connect(P, P). This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
composein classCompositeOperator- Parameters:
ctx- the context
-
-