Class CzechAnalyzer

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public final class CzechAnalyzer
    extends StopwordAnalyzerBase
    Analyzer for Czech language.

    Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified.

    You must specify the required Version compatibility when creating CzechAnalyzer:

    • As of 3.1, words are stemmed with CzechStemFilter
    • As of 2.9, StopFilter preserves position increments
    • As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)
    • Field Detail

      • DEFAULT_STOPWORD_FILE

        public static final java.lang.String DEFAULT_STOPWORD_FILE
        File containing default Czech stopwords.
        See Also:
        Constant Field Values
    • Constructor Detail

      • CzechAnalyzer

        public CzechAnalyzer​(Version matchVersion)
        Builds an analyzer with the default stop words (getDefaultStopSet()).
        Parameters:
        matchVersion - Lucene version to match See {@link above}
      • CzechAnalyzer

        public CzechAnalyzer​(Version matchVersion,
                             CharArraySet stopwords)
        Builds an analyzer with the given stop words.
        Parameters:
        matchVersion - Lucene version to match See {@link above}
        stopwords - a stopword set
      • CzechAnalyzer

        public CzechAnalyzer​(Version matchVersion,
                             CharArraySet stopwords,
                             CharArraySet stemExclusionTable)
        Builds an analyzer with the given stop words and a set of work to be excluded from the CzechStemFilter.
        Parameters:
        matchVersion - Lucene version to match See {@link above}
        stopwords - a stopword set
        stemExclusionTable - a stemming exclusion set
    • Method Detail

      • getDefaultStopSet

        public static final CharArraySet getDefaultStopSet()
        Returns a set of default Czech-stopwords
        Returns:
        a set of default Czech-stopwords