Class NGramModel

  • All Implemented Interfaces:
    java.lang.Iterable<StringList>

    public class NGramModel
    extends java.lang.Object
    implements java.lang.Iterable<StringList>
    The NGramModel can be used to crate ngrams and character ngrams.
    See Also:
    StringList
    • Constructor Summary

      Constructors 
      Constructor Description
      NGramModel()
      Initializes an empty instance.
      NGramModel​(java.io.InputStream in)
      Initializes the current instance.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void add​(java.lang.CharSequence chars, int minLength, int maxLength)
      Adds character NGrams to the current instance.
      void add​(StringList ngram)
      Adds one NGram, if it already exists the count increase by one.
      void add​(StringList ngram, int minLength, int maxLength)
      Adds NGrams up to the specified length to the current instance.
      boolean contains​(StringList tokens)
      Checks fit he given tokens are contained by the current instance.
      void cutoff​(int cutoffUnder, int cutoffOver)
      Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.
      boolean equals​(java.lang.Object obj)  
      int getCount​(StringList ngram)
      Retrieves the count of the given ngram.
      int hashCode()  
      java.util.Iterator<StringList> iterator()
      Retrieves an Iterator over all StringList entries.
      int numberOfGrams()
      Retrieves the total count of all Ngrams.
      void remove​(StringList tokens)
      Removes the specified tokens form the NGram model, they are just dropped.
      void serialize​(java.io.OutputStream out)
      Writes the ngram instance to the given OutputStream.
      void setCount​(StringList ngram, int count)
      Sets the count of an existing ngram.
      int size()
      Retrieves the number of StringList entries in the current instance.
      Dictionary toDictionary()
      Creates a dictionary which contain all StringList which are in the current NGramModel.
      Dictionary toDictionary​(boolean caseSensitive)
      Creates a dictionary which contains all StringLists which are in the current NGramModel.
      java.lang.String toString()  
      • Methods inherited from class java.lang.Object

        getClass, notify, notifyAll, wait, wait, wait
      • Methods inherited from interface java.lang.Iterable

        forEach, spliterator
    • Constructor Detail

      • NGramModel

        public NGramModel()
        Initializes an empty instance.
      • NGramModel

        public NGramModel​(java.io.InputStream in)
                   throws java.io.IOException
        Initializes the current instance.
        Parameters:
        in - the serialized model stream
        Throws:
        java.io.IOException
    • Method Detail

      • getCount

        public int getCount​(StringList ngram)
        Retrieves the count of the given ngram.
        Parameters:
        ngram - an ngram
        Returns:
        count of the ngram or 0 if it is not contained
      • setCount

        public void setCount​(StringList ngram,
                             int count)
        Sets the count of an existing ngram.
        Parameters:
        ngram -
        count -
      • add

        public void add​(StringList ngram)
        Adds one NGram, if it already exists the count increase by one.
        Parameters:
        ngram -
      • add

        public void add​(StringList ngram,
                        int minLength,
                        int maxLength)
        Adds NGrams up to the specified length to the current instance.
        Parameters:
        ngram - the tokens to build the uni-grams, bi-grams, tri-grams, .. from.
        minLength - - minimal length
        maxLength - - maximal length
      • add

        public void add​(java.lang.CharSequence chars,
                        int minLength,
                        int maxLength)
        Adds character NGrams to the current instance.
        Parameters:
        chars -
        minLength -
        maxLength -
      • remove

        public void remove​(StringList tokens)
        Removes the specified tokens form the NGram model, they are just dropped.
        Parameters:
        tokens -
      • contains

        public boolean contains​(StringList tokens)
        Checks fit he given tokens are contained by the current instance.
        Parameters:
        tokens -
        Returns:
        true if the ngram is contained
      • size

        public int size()
        Retrieves the number of StringList entries in the current instance.
        Returns:
        number of different grams
      • iterator

        public java.util.Iterator<StringList> iterator()
        Retrieves an Iterator over all StringList entries.
        Specified by:
        iterator in interface java.lang.Iterable<StringList>
        Returns:
        iterator over all grams
      • numberOfGrams

        public int numberOfGrams()
        Retrieves the total count of all Ngrams.
        Returns:
        total count of all ngrams
      • cutoff

        public void cutoff​(int cutoffUnder,
                           int cutoffOver)
        Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.
        Parameters:
        cutoffUnder -
        cutoffOver -
      • toDictionary

        public Dictionary toDictionary()
        Creates a dictionary which contain all StringList which are in the current NGramModel. Entries which are only different in the case are merged into one. Calling this method is the same as calling toDictionary(boolean) with true.
        Returns:
        a dictionary of the ngrams
      • toDictionary

        public Dictionary toDictionary​(boolean caseSensitive)
        Creates a dictionary which contains all StringLists which are in the current NGramModel.
        Parameters:
        caseSensitive - Specifies whether case distinctions should be kept in the creation of the dictionary.
        Returns:
        a dictionary of the ngrams
      • serialize

        public void serialize​(java.io.OutputStream out)
                       throws java.io.IOException
        Writes the ngram instance to the given OutputStream.
        Parameters:
        out -
        Throws:
        java.io.IOException - if an I/O Error during writing occurs
      • equals

        public boolean equals​(java.lang.Object obj)
        Overrides:
        equals in class java.lang.Object
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object