Module datarush.analytics
Package com.pervasive.datarush.analytics.text
Provides various unstructured text processing operators.
-
Interface Summary Interface Description TextNode Interface for a node in aTextContainer
tree. -
Class Summary Class Description CalculateNGramFrequency Calculates the n-gram frequencies for a tokenized text field.CalculateWordFrequency Calculates the word frequencies for a tokenized text field.ConvertTextCase Converts the case on a TokenizedText field.CountTokens Counts the number of tokens in a tokenized text field.DictionaryFilter Filters a tokenized text field using a dictionary.ExpandTextFrequency Expands text frequency field.ExpandTextTokens Expands a TokenizedText field.FilterText Filters a tokenized text field.GenerateBagOfWords Calculates the bag of words for a tokenized text field.NGram Implementation of an n-gram.NGramMap Implementation of an n-gram model.RegexWordBreakIterator A word break iterator that that allows its default behavior for the Locale to be overridden by supplied regular expression rules.TextContainer A tree node that can hold information on text elements.TextElement Definition of a text element.TextFrequencyFilter Filters a frequency map field.TextStemmer Stems a TokenizedText field.TextTokenizer Tokenizes a string field as a TokenizedText object.TextTokenUtil Utility methods for operating on TextContainer objects.TokenizedParagraph A TextContainer that can store a tokenized paragraph.TokenizedSentence A TextContainer that can store a tokenized sentence.TokenizedText A TextContainer that can store a tokenized document.TokenizedWord A TextContainer that can store a tokenized word.WordMap Implementation of a word frequency model. -
Enum Summary Enum Description ConvertTextCase.Case RegexWordBreakIterator.WordPattern TextElementType Enumeration of the possible text and character groupings.