Package | Description |
---|---|
com.pervasive.datarush.analytics.text |
Provides various unstructured text processing operators.
|
com.pervasive.datarush.analytics.text.filter |
Modifier and Type | Class and Description |
---|---|
class |
TokenizedParagraph
A TextContainer that can store a tokenized paragraph.
|
class |
TokenizedSentence
A TextContainer that can store a tokenized sentence.
|
class |
TokenizedText
A TextContainer that can store a tokenized document.
|
class |
TokenizedWord
A TextContainer that can store a tokenized word.
|
Modifier and Type | Field and Description |
---|---|
protected TextContainer |
TextContainer.nextSibling |
protected TextContainer |
TextContainer.parent |
protected TextContainer |
TextContainer.prevSibling |
Modifier and Type | Field and Description |
---|---|
protected List<TextContainer> |
TextContainer.children |
Modifier and Type | Method and Description |
---|---|
static TextContainer |
TextTokenUtil.createTreeFromList(List<TextContainer> nodes)
Creates a TextContainer from a list of TextContainer nodes.
|
static TextContainer |
TextTokenUtil.createTreeFromString(String textTokens) |
Modifier and Type | Method and Description |
---|---|
List<TextContainer> |
TextNode.getChildren()
Get the ordered list of direct children of this node.
|
List<TextContainer> |
TextContainer.getChildren() |
ListIterator<TextContainer> |
TextNode.getIterator()
Get an iterator over all descendants of this node including this node.
|
ListIterator<TextContainer> |
TextContainer.getIterator() |
ListIterator<TextContainer> |
TextNode.getIterator(TextElementType type)
Get an iterator over the node and all descendants of this node of the specified type.
|
ListIterator<TextContainer> |
TextContainer.getIterator(TextElementType type) |
ListIterator<TextContainer> |
TextContainer.getPostIterator() |
ListIterator<TextContainer> |
TextContainer.getPostIterator(TextElementType type) |
Modifier and Type | Method and Description |
---|---|
static NGramMap |
TextTokenUtil.calcNGramFreq(TextContainer text,
int n)
Creates an n-gram frequency model based on the contents of the TextContainer.
|
static NGramMap |
TextTokenUtil.calcNGramFreq(TextContainer text,
int n,
Set<NGram> nGramSet)
Creates an n-gram frequency model containing the specified set of terms based
on the contents of the TextContainer.
|
static WordMap |
TextTokenUtil.calcWordFreq(TextContainer text)
Creates a term frequency model based on the contents of the TextContainer.
|
static WordMap |
TextTokenUtil.calcWordFreq(TextContainer text,
Set<String> wordSet)
Creates a term frequency model containing the specified set of terms based
on the contents of the TextContainer.
|
static int |
TextTokenUtil.countElementType(TextContainer text,
TextElementType type)
Counts the number of elements of a specific type in the TextContainer.
|
static Set<String> |
TextTokenUtil.genBagOfWords(TextContainer text)
Creates a bag of words based on the contents of the TextContainer.
|
static List<NGram> |
TextTokenUtil.generateNGramList(TextContainer text,
int n)
Lists the unique n-grams contained in the TextContainer.
|
static List<String> |
TextTokenUtil.generateWordList(TextContainer text)
Lists the unique words contained in the TextContainer.
|
Modifier and Type | Method and Description |
---|---|
static TextContainer |
TextTokenUtil.createTreeFromList(List<TextContainer> nodes)
Creates a TextContainer from a list of TextContainer nodes.
|
Constructor and Description |
---|
TextContainer(TextElementType type,
List<? extends TextContainer> children)
Construct a container of element type with the specified children.
|
TokenizedParagraph(List<? extends TextContainer> tokens)
Create a tokenized paragraph container.
|
TokenizedSentence(List<? extends TextContainer> tokens)
Create a tokenized sentence container.
|
TokenizedText(List<? extends TextContainer> tokens)
Create a tokenized document container.
|
Modifier and Type | Method and Description |
---|---|
TextContainer |
WordFilter.filterText(TextContainer text) |
TextContainer |
TextFilter.filterText(TextContainer text)
Returns the tokenized text with the filtered tokens removed.
|
TextContainer |
TextElementFilter.filterText(TextContainer text) |
TextContainer |
RegexFilter.filterText(TextContainer text) |
TextContainer |
PunctuationFilter.filterText(TextContainer text) |
TextContainer |
LengthFilter.filterText(TextContainer text) |
TextContainer |
AbstractTextFilter.filterText(TextContainer text) |
Modifier and Type | Method and Description |
---|---|
TextContainer |
WordFilter.filterText(TextContainer text) |
TextContainer |
TextFilter.filterText(TextContainer text)
Returns the tokenized text with the filtered tokens removed.
|
TextContainer |
TextElementFilter.filterText(TextContainer text) |
TextContainer |
RegexFilter.filterText(TextContainer text) |
TextContainer |
PunctuationFilter.filterText(TextContainer text) |
TextContainer |
LengthFilter.filterText(TextContainer text) |
TextContainer |
AbstractTextFilter.filterText(TextContainer text) |
Copyright © 2021 Actian Corporation. All rights reserved.