- java.lang.Object
-
- com.pervasive.datarush.operators.AbstractLogicalOperator
-
- com.pervasive.datarush.operators.CompositeOperator
-
- com.pervasive.datarush.operators.AbstractRecordCompositeOperator
-
- com.pervasive.datarush.operators.select.SampleRandomRows
-
- All Implemented Interfaces:
LogicalOperator
,PipelineOperator<RecordPort>
,RecordPipelineOperator
public class SampleRandomRows extends AbstractRecordCompositeOperator
Apply random sampling to the input data. The schema of the output data matches that of the input. The output data usually contains fewer rows than the input. The number of rows output varies depending on the value of thepercent
or thesampleSize
property.The sampling can be executed in one of two modes:
- BY_PERCENT: the specified percentage of rows will be output
- BY_SIZE: the rows output depend on the given sample size and the total number of rows of the input data
For example, using
BY_PERCENT
mode with 10000 input rows andpercent
set to 0.25, you can expect approximately 2500 rows of output. This value is not exact. It will vary with different settings of theseed
property.In contrast, using
BY_SIZE
mode with any input data size andsampleSize
set to 2500, you can expect approximately 2500 rows of output. This value is not exact. It will vary with different settings of theseed
property. UseBY_SIZE
when you want to have a specific number of rows output. ThesampleSize
property sets an upper limit on the number of rows that will be output.The
seed
property is set to the current time (System.currentTimeMillis()
by default. Override this value to specify the random seed to use.
-
-
Field Summary
-
Fields inherited from class com.pervasive.datarush.operators.AbstractRecordCompositeOperator
input, output
-
-
Constructor Summary
Constructors Constructor Description SampleRandomRows()
Performs default random sampling on the data.SampleRandomRows(double percent, long seed)
Perform random sampling selecting a fixed percentage of the input.SampleRandomRows(long sampleSize, long seed)
Perform random sampling selecting a fixed number of records from the input data.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
compose(CompositionContext ctx)
Compose the body of this operator.RecordPort
getInput()
Returns the input portSampleMode
getMode()
Get the sample mode.RecordPort
getOutput()
Returns the output portdouble
getPercent()
Get the percentage of input data to output.long
getSampleSize()
Get the sample size (in rows) of data wanted.Long
getSeed()
Get the random number generator seed value.void
setMode(SampleMode mode)
Set the sample mode.void
setPercent(double percent)
Set the percentage of input data wanted.void
setSampleSize(long sampleSize)
Set the wanted sample size in rows.void
setSeed(Long seed)
Set the random number generator seed value.-
Methods inherited from class com.pervasive.datarush.operators.AbstractLogicalOperator
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.pervasive.datarush.operators.LogicalOperator
disableParallelism, getInputPorts, getOutputPorts
-
-
-
-
Constructor Detail
-
SampleRandomRows
public SampleRandomRows()
Performs default random sampling on the data. By default, sampling will select a fixed percentage of the input.
-
SampleRandomRows
public SampleRandomRows(double percent, long seed)
Perform random sampling selecting a fixed percentage of the input.- Parameters:
percent
- percentage of the input data wantedseed
- seed value for the random number generator
-
SampleRandomRows
public SampleRandomRows(long sampleSize, long seed)
Perform random sampling selecting a fixed number of records from the input data.- Parameters:
sampleSize
- the wanted output sample size (in rows)seed
- seed value for the random number generator
-
-
Method Detail
-
getInput
public RecordPort getInput()
Description copied from interface:PipelineOperator
Returns the input port- Specified by:
getInput
in interfacePipelineOperator<RecordPort>
- Overrides:
getInput
in classAbstractRecordCompositeOperator
- Returns:
- the input port
-
getOutput
public RecordPort getOutput()
Description copied from interface:PipelineOperator
Returns the output port- Specified by:
getOutput
in interfacePipelineOperator<RecordPort>
- Overrides:
getOutput
in classAbstractRecordCompositeOperator
- Returns:
- the output port
-
getSeed
public Long getSeed()
Get the random number generator seed value.- Returns:
- random number generator seed value
-
setSeed
public void setSeed(Long seed)
Set the random number generator seed value.- Parameters:
seed
- random number generator seed value
-
getPercent
public double getPercent()
Get the percentage of input data to output.- Returns:
- percentage of input data
-
setPercent
public void setPercent(double percent)
Set the percentage of input data wanted. This value must be in the range: 0 < seed < 1.0. This value is only used of the sample mode isBY_PERCENT
.- Parameters:
percent
- percentage of input data
-
getSampleSize
public long getSampleSize()
Get the sample size (in rows) of data wanted.- Returns:
- sample size
-
setSampleSize
public void setSampleSize(long sampleSize)
Set the wanted sample size in rows. Set this value when using theBY_SIZE
sample mode. The operator will output approximately the sample size number of rows.- Parameters:
sampleSize
- wanted sample size
-
getMode
public SampleMode getMode()
Get the sample mode.- Returns:
- sample mode
-
setMode
public void setMode(SampleMode mode)
Set the sample mode.- Parameters:
mode
- sample mode
-
compose
protected void compose(CompositionContext ctx)
Description copied from class:CompositeOperator
Compose the body of this operator. Implementations should do the following:- Perform any validation of configuration, input types, etc
- Instantiate and configure sub-operators, adding them to the provided context via
the method
OperatorComposable.add(O)
- Create necessary connections via the method
OperatorComposable.connect(P, P)
. This includes connections from the composite's input ports to sub-operators, connections between sub-operators, and connections from sub-operators output ports to the composite's output ports
- Specified by:
compose
in classCompositeOperator
- Parameters:
ctx
- the context
-
-