public class SampleRandomRows extends AbstractRecordCompositeOperator
percent or the sampleSize property.
The sampling can be executed in one of two modes:
For example, using BY_PERCENT mode with 10000 input rows and percent set to 0.25, you can
expect approximately 2500 rows of output. This value is not exact. It will vary with different
settings of the seed property.
In contrast, using BY_SIZE mode with any input data size and sampleSize set to 2500, you can
expect approximately 2500 rows of output. This value is not exact. It will vary with different
settings of the seed property. Use BY_SIZE when you want to
have a specific number of rows output. The sampleSize property sets an upper limit on the
number of rows that will be output.
The seed property is set to the current time (System.currentTimeMillis()
by default. Override this value to specify the random seed to use.
input, output| Constructor and Description |
|---|
SampleRandomRows()
Performs default random sampling on the data.
|
SampleRandomRows(double percent,
long seed)
Perform random sampling selecting a fixed percentage of the input.
|
SampleRandomRows(long sampleSize,
long seed)
Perform random sampling selecting a fixed number of records
from the input data.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
compose(CompositionContext ctx)
Compose the body of this operator.
|
RecordPort |
getInput()
Returns the input port
|
SampleMode |
getMode()
Get the sample mode.
|
RecordPort |
getOutput()
Returns the output port
|
double |
getPercent()
Get the percentage of input data to output.
|
long |
getSampleSize()
Get the sample size (in rows) of data wanted.
|
Long |
getSeed()
Get the random number generator seed value.
|
void |
setMode(SampleMode mode)
Set the sample mode.
|
void |
setPercent(double percent)
Set the percentage of input data wanted.
|
void |
setSampleSize(long sampleSize)
Set the wanted sample size in rows.
|
void |
setSeed(Long seed)
Set the random number generator seed value.
|
disableParallelism, getInputPorts, getOutputPorts, newInput, newInput, newOutput, newRecordInput, newRecordInput, newRecordOutput, notifyErrorclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitdisableParallelism, getInputPorts, getOutputPortspublic SampleRandomRows()
public SampleRandomRows(double percent,
long seed)
percent - percentage of the input data wantedseed - seed value for the random number generatorpublic SampleRandomRows(long sampleSize,
long seed)
sampleSize - the wanted output sample size (in rows)seed - seed value for the random number generatorpublic RecordPort getInput()
PipelineOperatorgetInput in interface PipelineOperator<RecordPort>getInput in class AbstractRecordCompositeOperatorpublic RecordPort getOutput()
PipelineOperatorgetOutput in interface PipelineOperator<RecordPort>getOutput in class AbstractRecordCompositeOperatorpublic Long getSeed()
public void setSeed(Long seed)
seed - random number generator seed valuepublic double getPercent()
public void setPercent(double percent)
BY_PERCENT.percent - percentage of input datapublic long getSampleSize()
public void setSampleSize(long sampleSize)
BY_SIZE sample mode. The operator
will output approximately the sample size number of rows.sampleSize - wanted sample sizepublic SampleMode getMode()
public void setMode(SampleMode mode)
mode - sample modeprotected void compose(CompositionContext ctx)
CompositeOperatorOperatorComposable.add(O)OperatorComposable.connect(P, P). This includes
connections from the composite's input ports to sub-operators, connections between sub-operators, and
connections from sub-operators output ports to the composite's output portscompose in class CompositeOperatorctx - the contextCopyright © 2021 Actian Corporation. All rights reserved.