public class TextTokenUtil extends Object
Constructor and Description |
---|
TextTokenUtil() |
Modifier and Type | Method and Description |
---|---|
static NGramMap |
calcNGramFreq(TextContainer text,
int n)
Creates an n-gram frequency model based on the contents of the TextContainer.
|
static NGramMap |
calcNGramFreq(TextContainer text,
int n,
Set<NGram> nGramSet)
Creates an n-gram frequency model containing the specified set of terms based
on the contents of the TextContainer.
|
static WordMap |
calcWordFreq(TextContainer text)
Creates a term frequency model based on the contents of the TextContainer.
|
static WordMap |
calcWordFreq(TextContainer text,
Set<String> wordSet)
Creates a term frequency model containing the specified set of terms based
on the contents of the TextContainer.
|
static int |
countElementType(TextContainer text,
TextElementType type)
Counts the number of elements of a specific type in the TextContainer.
|
static TextContainer |
createTreeFromList(List<TextContainer> nodes)
Creates a TextContainer from a list of TextContainer nodes.
|
static TextContainer |
createTreeFromString(String textTokens) |
static Set<String> |
genBagOfWords(TextContainer text)
Creates a bag of words based on the contents of the TextContainer.
|
static List<NGram> |
generateNGramList(TextContainer text,
int n)
Lists the unique n-grams contained in the TextContainer.
|
static List<String> |
generateWordList(TextContainer text)
Lists the unique words contained in the TextContainer.
|
static <K,V extends Comparable<V>> |
sortMapByValue(Map<K,V> map)
Sorts a map by the values associated with each key and returns a list of the entries
that have been sorted.
|
public static List<String> generateWordList(TextContainer text)
text
- the container of tokenized textpublic static List<NGram> generateNGramList(TextContainer text, int n)
text
- the container of tokenized textn
- the degree of the n-gramspublic static int countElementType(TextContainer text, TextElementType type)
text
- the container of tokenized texttype
- the type of text element to countpublic static Set<String> genBagOfWords(TextContainer text)
text
- the container of tokenized textpublic static WordMap calcWordFreq(TextContainer text)
text
- the container of tokenized textpublic static WordMap calcWordFreq(TextContainer text, Set<String> wordSet)
text
- the container of tokenized textwordSet
- the set of terms to include in the modelpublic static NGramMap calcNGramFreq(TextContainer text, int n)
text
- the container of tokenized textn
- the degree of the n-gramspublic static NGramMap calcNGramFreq(TextContainer text, int n, Set<NGram> nGramSet)
text
- the container of tokenized textn
- the degree of the n-gramsnGramSet
- the set of n-grams to include in the modelpublic static <K,V extends Comparable<V>> List<Map.Entry<K,V>> sortMapByValue(Map<K,V> map)
map
- the map of entries that will be sortedpublic static TextContainer createTreeFromList(List<TextContainer> nodes)
nodes
- a list of TextContainers representing a pre-order traversalpublic static TextContainer createTreeFromString(String textTokens)
Copyright © 2020 Actian Corporation. All rights reserved.