java.lang.Object
com.pervasive.datarush.analytics.text.NGramMap
- Direct Known Subclasses:
WordMap
Implementation of an n-gram model. An n-gram model is a type of probabilistic language
model for predicting the next item in a sequence based on the previous items in the
sequence.
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionintCalculates the total number of elements in the original text based on the current contents of the map.booleandecreaseFreq(NGram nGram) Removes an n-gram from the map or decrease the frequency if the absolute frequency is greater than one.booleanvoidfilterByThreshold(int min, int max) Filters this n-gram map of all frequencies between min and max inclusively.voidfilterByTotal(int total) Filters this n-gram map of all but the top frequencies.static TokenDecoderstatic TokenEncoderintgetFrequency(NGram ngram) Get the absolute frequency of an NGram in the map.Get an ordered list of the absolute frequencies contained in the map.getMap()Get a copy of the map that backs this object.intgetN()Get the degree of the n-grams in this map.Get an ordered list of the n-grams contained in the map.intGet the total number of elements in the original text.doublegetProbability(NGram ngram) Get the relative frequency of an n-gram in the map.Get an ordered list of the relative frequencies contained in the map.inthashCode()booleanincreaseFreq(NGram nGram) Adds an n-gram to the map or increases the frequency if it is already present.iterator()Get an iterator over the entries in the map.intremoveNGram(NGram nGram) Removes an n-gram from the map.voidsetOrigTextSize(int origTextSize) Set the total number of elements in the original text.toString()
-
Field Details
-
map
-
-
Constructor Details
-
NGramMap
public NGramMap()Default constructor -
NGramMap
public NGramMap(int n) Create an n-gram to frequency map.- Parameters:
n- the degree of the NGrams in the map
-
NGramMap
Create an n-gram to frequency map.- Parameters:
n- the degree of the NGrams in the mapmap- the mappings to use
-
NGramMap
Create an n-gram to frequency map.- Parameters:
n- the degree of the NGrams in the mapmap- the mappings to usetextSize- the number of elements in the original text
-
NGramMap
Copy an n-gram to frequency map.- Parameters:
map- the n-gram map to copy
-
-
Method Details
-
getN
public int getN()Get the degree of the n-grams in this map.- Returns:
- the n used by the NGrams in this map
-
getOrigTextSize
public int getOrigTextSize()Get the total number of elements in the original text.- Returns:
- the origTextSize used to calculate the relative frequencies
-
setOrigTextSize
public void setOrigTextSize(int origTextSize) Set the total number of elements in the original text.- Parameters:
origTextSize- the total number of words in the original text
-
getNGramList
Get an ordered list of the n-grams contained in the map.- Returns:
- the list of NGrams
-
getFrequencyList
Get an ordered list of the absolute frequencies contained in the map.- Returns:
- the list of frequencies
-
getProbabilityList
Get an ordered list of the relative frequencies contained in the map.- Returns:
- the list of frequencies
-
getMap
Get a copy of the map that backs this object.- Returns:
- map of Ngrams to Integers
-
calcOrigTextSize
public int calcOrigTextSize()Calculates the total number of elements in the original text based on the current contents of the map. This will only be accurate if the map is not a subset of the original model created from the text.- Returns:
- the total of all elements in the map
-
iterator
Get an iterator over the entries in the map.- Returns:
- an iterator over the map entries
-
getFrequency
Get the absolute frequency of an NGram in the map.- Parameters:
ngram- the n-gram to get the frequency of- Returns:
- the absolute frequency of the n-gram
-
getProbability
Get the relative frequency of an n-gram in the map. If the OrigTextSize has not been set will calculate based current map.- Parameters:
ngram- the n-gram to get the frequency of- Returns:
- the relative frequency of the n-gram
-
increaseFreq
Adds an n-gram to the map or increases the frequency if it is already present.- Parameters:
nGram- element to increase the frequency of in the map- Returns:
- true if n-gram is valid and could be incremented
-
decreaseFreq
Removes an n-gram from the map or decrease the frequency if the absolute frequency is greater than one.- Parameters:
nGram- element to decrease the frequency of in the map- Returns:
- true if n-gram is valid and could be decremented
-
removeNGram
Removes an n-gram from the map.- Parameters:
nGram- element to remove from the map- Returns:
- the frequency previously associated with the n-gram or null
-
filterByThreshold
public void filterByThreshold(int min, int max) Filters this n-gram map of all frequencies between min and max inclusively.- Parameters:
min- the smallest frequency to keepmax- the largest frequency to keep
-
filterByTotal
public void filterByTotal(int total) Filters this n-gram map of all but the top frequencies.- Parameters:
total- number of top frequencies to keep
-
toString
-
hashCode
public int hashCode() -
equals
-
getEncoder
-
getDecoder
-