- java.lang.Object
-
- com.pervasive.datarush.analytics.text.NGramMap
-
- com.pervasive.datarush.analytics.text.WordMap
-
public class WordMap extends NGramMap
Implementation of a word frequency model.
-
-
Constructor Summary
Constructors Constructor Description WordMap()
Default constructor of an empty word map.WordMap(NGramMap map)
Convert a valid NGramMap into a word map.WordMap(WordMap map)
Copy a word to frequency map.WordMap(Map<String,Integer> map)
Create a word to frequency map.WordMap(Map<String,Integer> map, int textSize)
Create a word to frequency map.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
decreaseFreq(String word)
Removes a word from the map or decrease the frequency if the absolute frequency is greater than one.static TokenDecoder
getDecoder()
static TokenEncoder
getEncoder()
int
getFrequency(String word)
Get the absolute frequency of a word in the map.double
getProbability(String word)
Get the relative frequency of a word in the map.Map<String,Integer>
getStringMap()
Get a copy of the map that backs this object.List<String>
getWordList()
Get an ordered list of the words contained in the map.boolean
increaseFreq(String word)
Adds a word to the map or increases the frequency if it is already present.int
removeWord(String word)
Removes a word from the map.String
toString()
-
Methods inherited from class com.pervasive.datarush.analytics.text.NGramMap
calcOrigTextSize, decreaseFreq, equals, filterByThreshold, filterByTotal, getFrequency, getFrequencyList, getMap, getN, getNGramList, getOrigTextSize, getProbability, getProbabilityList, hashCode, increaseFreq, iterator, removeNGram, setOrigTextSize
-
-
-
-
Constructor Detail
-
WordMap
public WordMap()
Default constructor of an empty word map.
-
WordMap
public WordMap(Map<String,Integer> map)
Create a word to frequency map.- Parameters:
map
- the mappings to use
-
WordMap
public WordMap(Map<String,Integer> map, int textSize)
Create a word to frequency map.- Parameters:
map
- the mappings to usetextSize
- the number of elements in the original text
-
WordMap
public WordMap(WordMap map)
Copy a word to frequency map.- Parameters:
map
- the word map to copy
-
WordMap
public WordMap(NGramMap map)
Convert a valid NGramMap into a word map. A valid map has an N of one. If the NGramMap is invalid the WordMap will remain empty.- Parameters:
map
- the n-gram map to convert
-
-
Method Detail
-
getWordList
public List<String> getWordList()
Get an ordered list of the words contained in the map.- Returns:
- the list of words
-
getFrequency
public int getFrequency(String word)
Get the absolute frequency of a word in the map.- Parameters:
word
- the word to get the frequency of- Returns:
- the absolute frequency of the word
-
getProbability
public double getProbability(String word)
Get the relative frequency of a word in the map. If OrigTextSize has not been set will calculate based on the current map.- Parameters:
word
- the word to get the frequency of- Returns:
- the relative frequency of the word
-
increaseFreq
public boolean increaseFreq(String word)
Adds a word to the map or increases the frequency if it is already present.- Parameters:
word
- element to increase the frequency of in the map- Returns:
- true if word is valid and could be incremented
-
decreaseFreq
public boolean decreaseFreq(String word)
Removes a word from the map or decrease the frequency if the absolute frequency is greater than one.- Parameters:
word
- element to decrease the frequency of in the map- Returns:
- true if word is valid and could be decremented
-
removeWord
public int removeWord(String word)
Removes a word from the map.- Parameters:
word
- element to remove from the map- Returns:
- the frequency previously associated with the word or null
-
getStringMap
public Map<String,Integer> getStringMap()
Get a copy of the map that backs this object.- Returns:
- map of Strings to Integers
-
getEncoder
public static TokenEncoder getEncoder()
-
getDecoder
public static TokenDecoder getDecoder()
-
-