Class ArabicAnalyzer

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public final class ArabicAnalyzer
    extends StopwordAnalyzerBase
    Analyzer for Arabic.

    This analyzer implements light-stemming as specified by: Light Stemming for Arabic Information Retrieval http://www.mtholyoke.edu/~lballest/Pubs/arab_stem05.pdf

    The analysis package contains three primary components:

    • Field Detail

      • DEFAULT_STOPWORD_FILE

        public static final java.lang.String DEFAULT_STOPWORD_FILE
        File containing default Arabic stopwords. Default stopword list is from http://members.unine.ch/jacques.savoy/clef/index.html The stopword list is BSD-Licensed.
        See Also:
        Constant Field Values
    • Constructor Detail

      • ArabicAnalyzer

        public ArabicAnalyzer​(Version matchVersion,
                              CharArraySet stopwords)
        Builds an analyzer with the given stop words
        Parameters:
        matchVersion - lucene compatibility version
        stopwords - a stopword set
      • ArabicAnalyzer

        public ArabicAnalyzer​(Version matchVersion,
                              CharArraySet stopwords,
                              CharArraySet stemExclusionSet)
        Builds an analyzer with the given stop word. If a none-empty stem exclusion set is provided this analyzer will add a SetKeywordMarkerFilter before ArabicStemFilter.
        Parameters:
        matchVersion - lucene compatibility version
        stopwords - a stopword set
        stemExclusionSet - a set of terms not to be stemmed
    • Method Detail

      • getDefaultStopSet

        public static CharArraySet getDefaultStopSet()
        Returns an unmodifiable instance of the default stop-words set.
        Returns:
        an unmodifiable instance of the default stop-words set.