Class UAX29URLEmailTokenizerImpl31

  • All Implemented Interfaces:
    StandardTokenizerInterface

    @Deprecated
    public final class UAX29URLEmailTokenizerImpl31
    extends java.lang.Object
    implements StandardTokenizerInterface
    Deprecated.
    This class is only for exact backwards compatibility
    This class implements UAX29URLEmailTokenizer, except with a bug (https://issues.apache.org/jira/browse/LUCENE-3358) where Han and Hiragana characters would be split from combining characters:
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int EMAIL_TYPE
      Deprecated.
       
      static int HANGUL_TYPE
      Deprecated.
       
      static int HIRAGANA_TYPE
      Deprecated.
       
      static int IDEOGRAPHIC_TYPE
      Deprecated.
       
      static int KATAKANA_TYPE
      Deprecated.
       
      static int NUMERIC_TYPE
      Deprecated.
      Numbers
      static int SOUTH_EAST_ASIAN_TYPE
      Deprecated.
      Chars in class \p{Line_Break = Complex_Context} are from South East Asian scripts (Thai, Lao, Myanmar, Khmer, etc.).
      static int URL_TYPE
      Deprecated.
       
      static int WORD_TYPE
      Deprecated.
      Alphanumeric sequences
      static int YYEOF
      Deprecated.
      This character denotes the end of file
      static int YYINITIAL
      Deprecated.
      lexical states
    • Constructor Summary

      Constructors 
      Constructor Description
      UAX29URLEmailTokenizerImpl31​(java.io.Reader in)
      Deprecated.
      Creates a new scanner
    • Method Summary

      All Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      int getNextToken()
      Deprecated.
      Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.
      void getText​(CharTermAttribute t)
      Deprecated.
      Fills CharTermAttribute with the current token text.
      void yybegin​(int newState)
      Deprecated.
      Enters a new lexical state
      int yychar()
      Deprecated.
      Returns the current position.
      char yycharat​(int pos)
      Deprecated.
      Returns the character at position pos from the matched text.
      void yyclose()
      Deprecated.
      Closes the input stream.
      int yylength()
      Deprecated.
      Returns the length of the matched text region.
      void yypushback​(int number)
      Deprecated.
      Pushes the specified amount of characters back into the input stream.
      void yyreset​(java.io.Reader reader)
      Deprecated.
      Resets the scanner to read from a new input stream.
      int yystate()
      Deprecated.
      Returns the current lexical state.
      java.lang.String yytext()
      Deprecated.
      Returns the text matched by the current regular expression.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • YYEOF

        public static final int YYEOF
        Deprecated.
        This character denotes the end of file
        See Also:
        Constant Field Values
      • YYINITIAL

        public static final int YYINITIAL
        Deprecated.
        lexical states
        See Also:
        Constant Field Values
      • WORD_TYPE

        public static final int WORD_TYPE
        Deprecated.
        Alphanumeric sequences
        See Also:
        Constant Field Values
      • SOUTH_EAST_ASIAN_TYPE

        public static final int SOUTH_EAST_ASIAN_TYPE
        Deprecated.
        Chars in class \p{Line_Break = Complex_Context} are from South East Asian scripts (Thai, Lao, Myanmar, Khmer, etc.). Sequences of these are kept together as as a single token rather than broken up, because the logic required to break them at word boundaries is too complex for UAX#29.

        See Unicode Line Breaking Algorithm: http://www.unicode.org/reports/tr14/#SA

        See Also:
        Constant Field Values
    • Constructor Detail

      • UAX29URLEmailTokenizerImpl31

        public UAX29URLEmailTokenizerImpl31​(java.io.Reader in)
        Deprecated.
        Creates a new scanner
        Parameters:
        in - the java.io.Reader to read input from.
    • Method Detail

      • yyclose

        public final void yyclose()
                           throws java.io.IOException
        Deprecated.
        Closes the input stream.
        Throws:
        java.io.IOException
      • yyreset

        public final void yyreset​(java.io.Reader reader)
        Deprecated.
        Resets the scanner to read from a new input stream. Does not close the old reader. All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to ZZ_INITIAL. Internal scan buffer is resized down to its initial length, if it has grown.
        Specified by:
        yyreset in interface StandardTokenizerInterface
        Parameters:
        reader - the new input stream
      • yystate

        public final int yystate()
        Deprecated.
        Returns the current lexical state.
      • yybegin

        public final void yybegin​(int newState)
        Deprecated.
        Enters a new lexical state
        Parameters:
        newState - the new lexical state
      • yytext

        public final java.lang.String yytext()
        Deprecated.
        Returns the text matched by the current regular expression.
      • yycharat

        public final char yycharat​(int pos)
        Deprecated.
        Returns the character at position pos from the matched text. It is equivalent to yytext().charAt(pos), but faster
        Parameters:
        pos - the position of the character to fetch. A value from 0 to yylength()-1.
        Returns:
        the character at position pos
      • yypushback

        public void yypushback​(int number)
        Deprecated.
        Pushes the specified amount of characters back into the input stream. They will be read again by then next call of the scanning method
        Parameters:
        number - the number of characters to be read again. This number must not be greater than yylength()!
      • getNextToken

        public int getNextToken()
                         throws java.io.IOException
        Deprecated.
        Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.
        Specified by:
        getNextToken in interface StandardTokenizerInterface
        Returns:
        the next token
        Throws:
        java.io.IOException - if any I/O-Error occurs