Class UniCharIterator


  • public class UniCharIterator
    extends java.lang.Object
    Allow iteration by Unicode characters over a Java UTF-16 encoded string.

    A Java character is only a 16-bit quantity. Unicode characters can have values up to 0x10FFFF, which exceeds the space available in a Java character. When such Unicode characters appear in a Java string they are encoded using the UTF-16 encoding and occupuy two consecutive Java characters, known as a surrogate pair.

    This class allows the caller to step through a Java string it true Unicode character amounts. It also provides some static methods to generate Java characters from Unicode characters.

    An iterator instance is associated with an instance of the Java CharSequence interface. This interface is implemented by both the String and StringBuilder classes.

    At any given time, one can think of the iterator as being positioned between characters in the associated character sequence. It can also be positioned before the first character and after the last. Operations move the iterator forward or backward in the underlying character sequence and return the Unicode character passed over.

    The iterator carries an index number that can be useful for indexing into the character sequence independently of the iterator. Index values start at zero and count up to the number of Java characters in the sequence. Index zero is before the first character, index one is between the first and second characters, and so on.

    It does not make sense for the iterator to be positioned between the two Java characters making up a surrogate pair. Subsequent operations could lead to assertion errors and unpredictable results.

    Note: The iterator caches the length of the given character sequence. If the caller is using an iterator and modifies the sequence in such a way that its length changes, it must call an associate() overload to re-establish the length.

    • Constructor Summary

      Constructors 
      Constructor Description
      UniCharIterator()
      Default constructor.
      UniCharIterator​(java.lang.CharSequence charSequence)
      Construct an iterator associated with a given character sequence.
      UniCharIterator​(java.lang.CharSequence charSequence, int index)
      Construct an iterator associated with a given sequence, and initially positioned at a specified index.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static void append​(java.lang.StringBuilder s, int c)
      Append a Unicode character to a Java StringBuilder.
      void attach​(java.lang.CharSequence charSequence)
      Attach the iterator to a given character sequence.
      void attach​(java.lang.CharSequence charSequence, int index)
      Attach the iterator to a given sequence, and initially positioned at a specified index.
      int getIndex()
      Get the current Java character index number of the iterator.
      boolean isAtEnd()
      Query whether the iterator is at the end of the text.
      boolean isAtStart()
      Query whether the iterator is at the the of the text.
      int next()
      Advance the iterator by one Unicode character.
      int prev()
      Back up the iterator by one Unicode character.
      void setIndex​(int index)
      Set the iterator's index.
      static java.lang.String toString​(int c)
      Return a Java string that represents the given Unicode character.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • UniCharIterator

        public UniCharIterator()
        Default constructor.

        The iterator is not associated with any character sequence, and is not particularly useful until the attach() method is called.

      • UniCharIterator

        public UniCharIterator​(java.lang.CharSequence charSequence)
        Construct an iterator associated with a given character sequence. The iterator is initially positioned before the first character in the sequence.
        Parameters:
        charSequence - Character sequence to associate the iterator with.
      • UniCharIterator

        public UniCharIterator​(java.lang.CharSequence charSequence,
                               int index)
        Construct an iterator associated with a given sequence, and initially positioned at a specified index.
        Parameters:
        charSequence - Character sequence to associate the iterator with.
        index - Index number into the character sequence, with meaning as described above.
    • Method Detail

      • append

        public static void append​(java.lang.StringBuilder s,
                                  int c)
        Append a Unicode character to a Java StringBuilder. This method determines whether the Unicode character can be represented as a single Java character or must be a surrogate pair. It then adds the appropriate Java character(s) to the given string buffer.
        Parameters:
        s - String buffer to add to.
        c - Unicode character to be added.
      • attach

        public void attach​(java.lang.CharSequence charSequence)
        Attach the iterator to a given character sequence. The iterator is initially positioned before the first character in the sequence.
        Parameters:
        charSequence - Character sequence to associate the iterator with.
      • attach

        public void attach​(java.lang.CharSequence charSequence,
                           int index)
        Attach the iterator to a given sequence, and initially positioned at a specified index.
        Parameters:
        charSequence - Character sequence to associate the iterator with.
        index - Index number into the character sequence, with meaning as described above.
      • getIndex

        public int getIndex()
        Get the current Java character index number of the iterator.
        Returns:
        Index number, as described above.
      • isAtEnd

        public boolean isAtEnd()
        Query whether the iterator is at the end of the text.
        Returns:
        True if the iterator is positioned after the last character in the underlying text; false if not.
      • isAtStart

        public boolean isAtStart()
        Query whether the iterator is at the the of the text.
        Returns:
        True if the iterator is positioned before the first character in the underlying text; false if not.
      • next

        public int next()
        Advance the iterator by one Unicode character. The iterator will not be advanced if it is already positioined after the last Java character in the sequence. The iterator's index will increase by one or two, depending on the makeup of the Unicode character it advances over.
        Returns:
        Unicode character advanced over.
      • prev

        public int prev()
        Back up the iterator by one Unicode character. The iterator will not be moved if it is already positioined after the last Java character in the sequence. The iterator's index will decrease by one or two, depending on the makeup of the Unicode character it moves over.
        Returns:
        Unicode character passed over.
      • setIndex

        public void setIndex​(int index)
        Set the iterator's index. This method changes the index, but keeps the iterator associated with the same character sequence.
        Parameters:
        index - New index to set for this iterator.
      • toString

        public static java.lang.String toString​(int c)
        Return a Java string that represents the given Unicode character.
        Parameters:
        c - Unicode character to convert to a Java string.
        Returns:
        Resulting String. If the character is less than 0x10000, the result will simply contain the single character passed in. Otherwise it will contain the two characters making up the surrogate pair.