Class ArabicNormalizer


  • public class ArabicNormalizer
    extends java.lang.Object
    Normalizer for Arabic.

    Normalization is done in-place for efficiency, operating on a termbuffer.

    Normalization is defined as:

    • Normalization of hamza with alef seat to a bare alef.
    • Normalization of teh marbuta to heh
    • Normalization of dotless yeh (alef maksura) to yeh.
    • Removal of Arabic diacritics (the harakat)
    • Removal of tatweel (stretching character).