Class PostingsHighlighter


  • public class PostingsHighlighter
    extends java.lang.Object
    Simple highlighter that does not analyze fields nor use term vectors. Instead it requires FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.

    PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a BreakIterator to find passages in the text; by default it breaks using getSentenceInstance(Locale.ROOT). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

    You can customize the behavior by subclassing this highlighter, some important hooks:

    WARNING: The code is very new and probably still has some exciting bugs!

    Example usage:

       // configure field with offsets at index time
       FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
       offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
       Field body = new Field("body", "foobar", offsetsType);
    
       // retrieve highlights at query time 
       PostingsHighlighter highlighter = new PostingsHighlighter();
       Query query = new TermQuery(new Term("body", "highlighting"));
       TopDocs topDocs = searcher.search(query, n);
       String highlights[] = highlighter.highlight("body", query, searcher, topDocs);
     

    This is thread-safe, and can be used across different readers.

    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int DEFAULT_MAX_LENGTH
      Default maximum content size to process.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String[] highlight​(java.lang.String field, Query query, IndexSearcher searcher, TopDocs topDocs)
      Highlights the top passages from a single field.
      java.lang.String[] highlight​(java.lang.String field, Query query, IndexSearcher searcher, TopDocs topDocs, int maxPassages)
      Highlights the top-N passages from a single field.
      java.util.Map<java.lang.String,​java.lang.String[]> highlightFields​(java.lang.String[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn)
      Highlights the top-N passages from multiple fields, for the provided int[] docids.
      java.util.Map<java.lang.String,​java.lang.String[]> highlightFields​(java.lang.String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs)
      Highlights the top passages from multiple fields.
      java.util.Map<java.lang.String,​java.lang.String[]> highlightFields​(java.lang.String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs, int[] maxPassages)
      Highlights the top-N passages from multiple fields.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • DEFAULT_MAX_LENGTH

        public static final int DEFAULT_MAX_LENGTH
        Default maximum content size to process. Typically snippets closer to the beginning of the document better summarize its content
        See Also:
        Constant Field Values
    • Constructor Detail

      • PostingsHighlighter

        public PostingsHighlighter()
        Creates a new highlighter with DEFAULT_MAX_LENGTH.
      • PostingsHighlighter

        public PostingsHighlighter​(int maxLength)
        Creates a new highlighter, specifying maximum content length.
        Parameters:
        maxLength - maximum content size to process.
        Throws:
        java.lang.IllegalArgumentException - if maxLength is negative or Integer.MAX_VALUE
    • Method Detail

      • highlight

        public java.lang.String[] highlight​(java.lang.String field,
                                            Query query,
                                            IndexSearcher searcher,
                                            TopDocs topDocs)
                                     throws java.io.IOException
        Highlights the top passages from a single field.
        Parameters:
        field - field name to highlight. Must have a stored string value and also be indexed with offsets.
        query - query to highlight.
        searcher - searcher that was previously used to execute the query.
        topDocs - TopDocs containing the summary result documents to highlight.
        Returns:
        Array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first sentence for the field will be returned.
        Throws:
        java.io.IOException - if an I/O error occurred during processing
        java.lang.IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
      • highlight

        public java.lang.String[] highlight​(java.lang.String field,
                                            Query query,
                                            IndexSearcher searcher,
                                            TopDocs topDocs,
                                            int maxPassages)
                                     throws java.io.IOException
        Highlights the top-N passages from a single field.
        Parameters:
        field - field name to highlight. Must have a stored string value and also be indexed with offsets.
        query - query to highlight.
        searcher - searcher that was previously used to execute the query.
        topDocs - TopDocs containing the summary result documents to highlight.
        maxPassages - The maximum number of top-N ranked passages used to form the highlighted snippets.
        Returns:
        Array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first maxPassages sentences from the field will be returned.
        Throws:
        java.io.IOException - if an I/O error occurred during processing
        java.lang.IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
      • highlightFields

        public java.util.Map<java.lang.String,​java.lang.String[]> highlightFields​(java.lang.String[] fields,
                                                                                        Query query,
                                                                                        IndexSearcher searcher,
                                                                                        TopDocs topDocs)
                                                                                 throws java.io.IOException
        Highlights the top passages from multiple fields.

        Conceptually, this behaves as a more efficient form of:

         Map m = new HashMap();
         for (String field : fields) {
           m.put(field, highlight(field, query, searcher, topDocs));
         }
         return m;
         
        Parameters:
        fields - field names to highlight. Must have a stored string value and also be indexed with offsets.
        query - query to highlight.
        searcher - searcher that was previously used to execute the query.
        topDocs - TopDocs containing the summary result documents to highlight.
        Returns:
        Map keyed on field name, containing the array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first sentence from the field will be returned.
        Throws:
        java.io.IOException - if an I/O error occurred during processing
        java.lang.IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
      • highlightFields

        public java.util.Map<java.lang.String,​java.lang.String[]> highlightFields​(java.lang.String[] fields,
                                                                                        Query query,
                                                                                        IndexSearcher searcher,
                                                                                        TopDocs topDocs,
                                                                                        int[] maxPassages)
                                                                                 throws java.io.IOException
        Highlights the top-N passages from multiple fields.

        Conceptually, this behaves as a more efficient form of:

         Map m = new HashMap();
         for (String field : fields) {
           m.put(field, highlight(field, query, searcher, topDocs, maxPassages));
         }
         return m;
         
        Parameters:
        fields - field names to highlight. Must have a stored string value and also be indexed with offsets.
        query - query to highlight.
        searcher - searcher that was previously used to execute the query.
        topDocs - TopDocs containing the summary result documents to highlight.
        maxPassages - The maximum number of top-N ranked passages per-field used to form the highlighted snippets.
        Returns:
        Map keyed on field name, containing the array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first maxPassages sentences from the field will be returned.
        Throws:
        java.io.IOException - if an I/O error occurred during processing
        java.lang.IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
      • highlightFields

        public java.util.Map<java.lang.String,​java.lang.String[]> highlightFields​(java.lang.String[] fieldsIn,
                                                                                        Query query,
                                                                                        IndexSearcher searcher,
                                                                                        int[] docidsIn,
                                                                                        int[] maxPassagesIn)
                                                                                 throws java.io.IOException
        Highlights the top-N passages from multiple fields, for the provided int[] docids.
        Parameters:
        fieldsIn - field names to highlight. Must have a stored string value and also be indexed with offsets.
        query - query to highlight.
        searcher - searcher that was previously used to execute the query.
        docidsIn - containing the document IDs to highlight.
        maxPassagesIn - The maximum number of top-N ranked passages per-field used to form the highlighted snippets.
        Returns:
        Map keyed on field name, containing the array of formatted snippets corresponding to the documents in docidsIn. If no highlights were found for a document, the first maxPassages from the field will be returned.
        Throws:
        java.io.IOException - if an I/O error occurred during processing
        java.lang.IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS