Class NGramMap

  • Direct Known Subclasses:
    WordMap

    public class NGramMap
    extends Object
    Implementation of an n-gram model. An n-gram model is a type of probabilistic language model for predicting the next item in a sequence based on the previous items in the sequence.
    • Constructor Detail

      • NGramMap

        public NGramMap()
        Default constructor
      • NGramMap

        public NGramMap​(int n)
        Create an n-gram to frequency map.
        Parameters:
        n - the degree of the NGrams in the map
      • NGramMap

        public NGramMap​(int n,
                        Map<NGram,​Integer> map)
        Create an n-gram to frequency map.
        Parameters:
        n - the degree of the NGrams in the map
        map - the mappings to use
      • NGramMap

        public NGramMap​(int n,
                        Map<NGram,​Integer> map,
                        int textSize)
        Create an n-gram to frequency map.
        Parameters:
        n - the degree of the NGrams in the map
        map - the mappings to use
        textSize - the number of elements in the original text
      • NGramMap

        public NGramMap​(NGramMap map)
        Copy an n-gram to frequency map.
        Parameters:
        map - the n-gram map to copy
    • Method Detail

      • getN

        public int getN()
        Get the degree of the n-grams in this map.
        Returns:
        the n used by the NGrams in this map
      • getOrigTextSize

        public int getOrigTextSize()
        Get the total number of elements in the original text.
        Returns:
        the origTextSize used to calculate the relative frequencies
      • setOrigTextSize

        public void setOrigTextSize​(int origTextSize)
        Set the total number of elements in the original text.
        Parameters:
        origTextSize - the total number of words in the original text
      • getNGramList

        public List<NGram> getNGramList()
        Get an ordered list of the n-grams contained in the map.
        Returns:
        the list of NGrams
      • getFrequencyList

        public List<Integer> getFrequencyList()
        Get an ordered list of the absolute frequencies contained in the map.
        Returns:
        the list of frequencies
      • getProbabilityList

        public List<Double> getProbabilityList()
        Get an ordered list of the relative frequencies contained in the map.
        Returns:
        the list of frequencies
      • getMap

        public Map<NGram,​Integer> getMap()
        Get a copy of the map that backs this object.
        Returns:
        map of Ngrams to Integers
      • calcOrigTextSize

        public int calcOrigTextSize()
        Calculates the total number of elements in the original text based on the current contents of the map. This will only be accurate if the map is not a subset of the original model created from the text.
        Returns:
        the total of all elements in the map
      • iterator

        public Iterator<Map.Entry<NGram,​Integer>> iterator()
        Get an iterator over the entries in the map.
        Returns:
        an iterator over the map entries
      • getFrequency

        public int getFrequency​(NGram ngram)
        Get the absolute frequency of an NGram in the map.
        Parameters:
        ngram - the n-gram to get the frequency of
        Returns:
        the absolute frequency of the n-gram
      • getProbability

        public double getProbability​(NGram ngram)
        Get the relative frequency of an n-gram in the map. If the OrigTextSize has not been set will calculate based current map.
        Parameters:
        ngram - the n-gram to get the frequency of
        Returns:
        the relative frequency of the n-gram
      • increaseFreq

        public boolean increaseFreq​(NGram nGram)
        Adds an n-gram to the map or increases the frequency if it is already present.
        Parameters:
        nGram - element to increase the frequency of in the map
        Returns:
        true if n-gram is valid and could be incremented
      • decreaseFreq

        public boolean decreaseFreq​(NGram nGram)
        Removes an n-gram from the map or decrease the frequency if the absolute frequency is greater than one.
        Parameters:
        nGram - element to decrease the frequency of in the map
        Returns:
        true if n-gram is valid and could be decremented
      • removeNGram

        public int removeNGram​(NGram nGram)
        Removes an n-gram from the map.
        Parameters:
        nGram - element to remove from the map
        Returns:
        the frequency previously associated with the n-gram or null
      • filterByThreshold

        public void filterByThreshold​(int min,
                                      int max)
        Filters this n-gram map of all frequencies between min and max inclusively.
        Parameters:
        min - the smallest frequency to keep
        max - the largest frequency to keep
      • filterByTotal

        public void filterByTotal​(int total)
        Filters this n-gram map of all but the top frequencies.
        Parameters:
        total - number of top frequencies to keep
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object