java.lang.Object
com.pervasive.datarush.analytics.text.NGramMap
Direct Known Subclasses:
WordMap

public class NGramMap extends Object
Implementation of an n-gram model. An n-gram model is a type of probabilistic language model for predicting the next item in a sequence based on the previous items in the sequence.
  • Field Details

  • Constructor Details

    • NGramMap

      public NGramMap()
      Default constructor
    • NGramMap

      public NGramMap(int n)
      Create an n-gram to frequency map.
      Parameters:
      n - the degree of the NGrams in the map
    • NGramMap

      public NGramMap(int n, Map<NGram,Integer> map)
      Create an n-gram to frequency map.
      Parameters:
      n - the degree of the NGrams in the map
      map - the mappings to use
    • NGramMap

      public NGramMap(int n, Map<NGram,Integer> map, int textSize)
      Create an n-gram to frequency map.
      Parameters:
      n - the degree of the NGrams in the map
      map - the mappings to use
      textSize - the number of elements in the original text
    • NGramMap

      public NGramMap(NGramMap map)
      Copy an n-gram to frequency map.
      Parameters:
      map - the n-gram map to copy
  • Method Details

    • getN

      public int getN()
      Get the degree of the n-grams in this map.
      Returns:
      the n used by the NGrams in this map
    • getOrigTextSize

      public int getOrigTextSize()
      Get the total number of elements in the original text.
      Returns:
      the origTextSize used to calculate the relative frequencies
    • setOrigTextSize

      public void setOrigTextSize(int origTextSize)
      Set the total number of elements in the original text.
      Parameters:
      origTextSize - the total number of words in the original text
    • getNGramList

      public List<NGram> getNGramList()
      Get an ordered list of the n-grams contained in the map.
      Returns:
      the list of NGrams
    • getFrequencyList

      public List<Integer> getFrequencyList()
      Get an ordered list of the absolute frequencies contained in the map.
      Returns:
      the list of frequencies
    • getProbabilityList

      public List<Double> getProbabilityList()
      Get an ordered list of the relative frequencies contained in the map.
      Returns:
      the list of frequencies
    • getMap

      public Map<NGram,Integer> getMap()
      Get a copy of the map that backs this object.
      Returns:
      map of Ngrams to Integers
    • calcOrigTextSize

      public int calcOrigTextSize()
      Calculates the total number of elements in the original text based on the current contents of the map. This will only be accurate if the map is not a subset of the original model created from the text.
      Returns:
      the total of all elements in the map
    • iterator

      public Iterator<Map.Entry<NGram,Integer>> iterator()
      Get an iterator over the entries in the map.
      Returns:
      an iterator over the map entries
    • getFrequency

      public int getFrequency(NGram ngram)
      Get the absolute frequency of an NGram in the map.
      Parameters:
      ngram - the n-gram to get the frequency of
      Returns:
      the absolute frequency of the n-gram
    • getProbability

      public double getProbability(NGram ngram)
      Get the relative frequency of an n-gram in the map. If the OrigTextSize has not been set will calculate based current map.
      Parameters:
      ngram - the n-gram to get the frequency of
      Returns:
      the relative frequency of the n-gram
    • increaseFreq

      public boolean increaseFreq(NGram nGram)
      Adds an n-gram to the map or increases the frequency if it is already present.
      Parameters:
      nGram - element to increase the frequency of in the map
      Returns:
      true if n-gram is valid and could be incremented
    • decreaseFreq

      public boolean decreaseFreq(NGram nGram)
      Removes an n-gram from the map or decrease the frequency if the absolute frequency is greater than one.
      Parameters:
      nGram - element to decrease the frequency of in the map
      Returns:
      true if n-gram is valid and could be decremented
    • removeNGram

      public int removeNGram(NGram nGram)
      Removes an n-gram from the map.
      Parameters:
      nGram - element to remove from the map
      Returns:
      the frequency previously associated with the n-gram or null
    • filterByThreshold

      public void filterByThreshold(int min, int max)
      Filters this n-gram map of all frequencies between min and max inclusively.
      Parameters:
      min - the smallest frequency to keep
      max - the largest frequency to keep
    • filterByTotal

      public void filterByTotal(int total)
      Filters this n-gram map of all but the top frequencies.
      Parameters:
      total - number of top frequencies to keep
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • equals

      public boolean equals(Object obj)
      Overrides:
      equals in class Object
    • getEncoder

      public static TokenEncoder getEncoder()
    • getDecoder

      public static TokenDecoder getDecoder()