- java.lang.Object
-
- com.pervasive.datarush.analytics.text.NGramMap
-
-
Constructor Summary
Constructors Constructor Description NGramMap()
Default constructorNGramMap(int n)
Create an n-gram to frequency map.NGramMap(int n, Map<NGram,Integer> map)
Create an n-gram to frequency map.NGramMap(int n, Map<NGram,Integer> map, int textSize)
Create an n-gram to frequency map.NGramMap(NGramMap map)
Copy an n-gram to frequency map.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description int
calcOrigTextSize()
Calculates the total number of elements in the original text based on the current contents of the map.boolean
decreaseFreq(NGram nGram)
Removes an n-gram from the map or decrease the frequency if the absolute frequency is greater than one.boolean
equals(Object obj)
void
filterByThreshold(int min, int max)
Filters this n-gram map of all frequencies between min and max inclusively.void
filterByTotal(int total)
Filters this n-gram map of all but the top frequencies.static TokenDecoder
getDecoder()
static TokenEncoder
getEncoder()
int
getFrequency(NGram ngram)
Get the absolute frequency of an NGram in the map.List<Integer>
getFrequencyList()
Get an ordered list of the absolute frequencies contained in the map.Map<NGram,Integer>
getMap()
Get a copy of the map that backs this object.int
getN()
Get the degree of the n-grams in this map.List<NGram>
getNGramList()
Get an ordered list of the n-grams contained in the map.int
getOrigTextSize()
Get the total number of elements in the original text.double
getProbability(NGram ngram)
Get the relative frequency of an n-gram in the map.List<Double>
getProbabilityList()
Get an ordered list of the relative frequencies contained in the map.int
hashCode()
boolean
increaseFreq(NGram nGram)
Adds an n-gram to the map or increases the frequency if it is already present.Iterator<Map.Entry<NGram,Integer>>
iterator()
Get an iterator over the entries in the map.int
removeNGram(NGram nGram)
Removes an n-gram from the map.void
setOrigTextSize(int origTextSize)
Set the total number of elements in the original text.String
toString()
-
-
-
Constructor Detail
-
NGramMap
public NGramMap()
Default constructor
-
NGramMap
public NGramMap(int n)
Create an n-gram to frequency map.- Parameters:
n
- the degree of the NGrams in the map
-
NGramMap
public NGramMap(int n, Map<NGram,Integer> map)
Create an n-gram to frequency map.- Parameters:
n
- the degree of the NGrams in the mapmap
- the mappings to use
-
NGramMap
public NGramMap(int n, Map<NGram,Integer> map, int textSize)
Create an n-gram to frequency map.- Parameters:
n
- the degree of the NGrams in the mapmap
- the mappings to usetextSize
- the number of elements in the original text
-
NGramMap
public NGramMap(NGramMap map)
Copy an n-gram to frequency map.- Parameters:
map
- the n-gram map to copy
-
-
Method Detail
-
getN
public int getN()
Get the degree of the n-grams in this map.- Returns:
- the n used by the NGrams in this map
-
getOrigTextSize
public int getOrigTextSize()
Get the total number of elements in the original text.- Returns:
- the origTextSize used to calculate the relative frequencies
-
setOrigTextSize
public void setOrigTextSize(int origTextSize)
Set the total number of elements in the original text.- Parameters:
origTextSize
- the total number of words in the original text
-
getNGramList
public List<NGram> getNGramList()
Get an ordered list of the n-grams contained in the map.- Returns:
- the list of NGrams
-
getFrequencyList
public List<Integer> getFrequencyList()
Get an ordered list of the absolute frequencies contained in the map.- Returns:
- the list of frequencies
-
getProbabilityList
public List<Double> getProbabilityList()
Get an ordered list of the relative frequencies contained in the map.- Returns:
- the list of frequencies
-
getMap
public Map<NGram,Integer> getMap()
Get a copy of the map that backs this object.- Returns:
- map of Ngrams to Integers
-
calcOrigTextSize
public int calcOrigTextSize()
Calculates the total number of elements in the original text based on the current contents of the map. This will only be accurate if the map is not a subset of the original model created from the text.- Returns:
- the total of all elements in the map
-
iterator
public Iterator<Map.Entry<NGram,Integer>> iterator()
Get an iterator over the entries in the map.- Returns:
- an iterator over the map entries
-
getFrequency
public int getFrequency(NGram ngram)
Get the absolute frequency of an NGram in the map.- Parameters:
ngram
- the n-gram to get the frequency of- Returns:
- the absolute frequency of the n-gram
-
getProbability
public double getProbability(NGram ngram)
Get the relative frequency of an n-gram in the map. If the OrigTextSize has not been set will calculate based current map.- Parameters:
ngram
- the n-gram to get the frequency of- Returns:
- the relative frequency of the n-gram
-
increaseFreq
public boolean increaseFreq(NGram nGram)
Adds an n-gram to the map or increases the frequency if it is already present.- Parameters:
nGram
- element to increase the frequency of in the map- Returns:
- true if n-gram is valid and could be incremented
-
decreaseFreq
public boolean decreaseFreq(NGram nGram)
Removes an n-gram from the map or decrease the frequency if the absolute frequency is greater than one.- Parameters:
nGram
- element to decrease the frequency of in the map- Returns:
- true if n-gram is valid and could be decremented
-
removeNGram
public int removeNGram(NGram nGram)
Removes an n-gram from the map.- Parameters:
nGram
- element to remove from the map- Returns:
- the frequency previously associated with the n-gram or null
-
filterByThreshold
public void filterByThreshold(int min, int max)
Filters this n-gram map of all frequencies between min and max inclusively.- Parameters:
min
- the smallest frequency to keepmax
- the largest frequency to keep
-
filterByTotal
public void filterByTotal(int total)
Filters this n-gram map of all but the top frequencies.- Parameters:
total
- number of top frequencies to keep
-
getEncoder
public static TokenEncoder getEncoder()
-
getDecoder
public static TokenDecoder getDecoder()
-
-