public class SampleRandomRows extends AbstractRecordCompositeOperator
percent
or the sampleSize
property.
The sampling can be executed in one of two modes:
For example, using BY_PERCENT
mode with 10000 input rows and percent
set to 0.25, you can
expect approximately 2500 rows of output. This value is not exact. It will vary with different
settings of the seed
property.
In contrast, using BY_SIZE
mode with any input data size and sampleSize
set to 2500, you can
expect approximately 2500 rows of output. This value is not exact. It will vary with different
settings of the seed
property. Use BY_SIZE
when you want to
have a specific number of rows output. The sampleSize
property sets an upper limit on the
number of rows that will be output.
The seed
property is set to the current time (System.currentTimeMillis()
by default. Override this value to specify the random seed to use.
input, output
Constructor and Description |
---|
SampleRandomRows()
Performs default random sampling on the data.
|
SampleRandomRows(double percent,
long seed)
Perform random sampling selecting a fixed percentage of the input.
|
SampleRandomRows(long sampleSize,
long seed)
Perform random sampling selecting a fixed number of records
from the input data.
|
Modifier and Type | Method and Description |
---|---|
protected void |
compose(CompositionContext ctx)
Compose the body of this operator.
|
RecordPort |
getInput()
Returns the input port
|
SampleMode |
getMode()
Get the sample mode.
|
RecordPort |
getOutput()
Returns the output port
|
double |
getPercent()
Get the percentage of input data to output.
|
long |
getSampleSize()
Get the sample size (in rows) of data wanted.
|
Long |
getSeed()
Get the random number generator seed value.
|
void |
setMode(SampleMode mode)
Set the sample mode.
|
void |
setPercent(double percent)
Set the percentage of input data wanted.
|
void |
setSampleSize(long sampleSize)
Set the wanted sample size in rows.
|
void |
setSeed(Long seed)
Set the random number generator seed value.
|
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyError
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
disableParallelism, getInputPorts, getOutputPorts
public SampleRandomRows()
public SampleRandomRows(double percent, long seed)
percent
- percentage of the input data wantedseed
- seed value for the random number generatorpublic SampleRandomRows(long sampleSize, long seed)
sampleSize
- the wanted output sample size (in rows)seed
- seed value for the random number generatorpublic RecordPort getInput()
PipelineOperator
getInput
in interface PipelineOperator<RecordPort>
getInput
in class AbstractRecordCompositeOperator
public RecordPort getOutput()
PipelineOperator
getOutput
in interface PipelineOperator<RecordPort>
getOutput
in class AbstractRecordCompositeOperator
public Long getSeed()
public void setSeed(Long seed)
seed
- random number generator seed valuepublic double getPercent()
public void setPercent(double percent)
BY_PERCENT
.percent
- percentage of input datapublic long getSampleSize()
public void setSampleSize(long sampleSize)
BY_SIZE
sample mode. The operator
will output approximately the sample size number of rows.sampleSize
- wanted sample sizepublic SampleMode getMode()
public void setMode(SampleMode mode)
mode
- sample modeprotected void compose(CompositionContext ctx)
CompositeOperator
OperatorComposable.add(O)
OperatorComposable.connect(P, P)
. This includes
connections from the composite's input ports to sub-operators, connections between sub-operators, and
connections from sub-operators output ports to the composite's output portscompose
in class CompositeOperator
ctx
- the contextCopyright © 2020 Actian Corporation. All rights reserved.