Class EventBasedExcelExtractor

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, ExcelExtractor

    public class EventBasedExcelExtractor
    extends POIOLE2TextExtractor
    implements ExcelExtractor
    A text extractor for Excel files, that is based on the HSSF EventUserModel API. It will typically use less memory than ExcelExtractor, but may not provide the same richness of formatting. Returns the textual content of the file, suitable for indexing by something like Lucene, but not really intended for display to the user.

    To turn an excel file into a CSV or similar, then see the XLS2CSVmra example

    See Also:
    XLS2CSVmra
    • Constructor Detail

      • EventBasedExcelExtractor

        public EventBasedExcelExtractor​(DirectoryNode dir)
      • EventBasedExcelExtractor

        public EventBasedExcelExtractor​(POIFSFileSystem fs)
    • Method Detail

      • getSummaryInformation

        public SummaryInformation getSummaryInformation()
        Would return the summary information metadata for the document, if we supported it
        Overrides:
        getSummaryInformation in class POIOLE2TextExtractor
        Returns:
        The Summary information for the document or null if it could not be read for this document.
      • setIncludeCellComments

        public void setIncludeCellComments​(boolean includeComments)
        Would control the inclusion of cell comments from the document, if we supported it
        Specified by:
        setIncludeCellComments in interface ExcelExtractor
        Parameters:
        includeComments - true if cell comments should be included
      • setIncludeHeadersFooters

        public void setIncludeHeadersFooters​(boolean includeHeadersFooters)
        Would control the inclusion of headers and footers from the document, if we supported it
        Specified by:
        setIncludeHeadersFooters in interface ExcelExtractor
        Parameters:
        includeHeadersFooters - true if headers and footers should be included
      • setIncludeSheetNames

        public void setIncludeSheetNames​(boolean includeSheetNames)
        Should sheet names be included? Default is true
        Specified by:
        setIncludeSheetNames in interface ExcelExtractor
        Parameters:
        includeSheetNames - true if the sheet names should be included
      • setFormulasNotResults

        public void setFormulasNotResults​(boolean formulasNotResults)
        Should we return the formula itself, and not the result it produces? Default is false
        Specified by:
        setFormulasNotResults in interface ExcelExtractor
        Parameters:
        formulasNotResults - true if the formula itself is returned
      • getText

        public java.lang.String getText()
        Retreives the text contents of the file
        Specified by:
        getText in interface ExcelExtractor
        Specified by:
        getText in class POITextExtractor
        Returns:
        All the text from the document