Class Word6Extractor

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public final class Word6Extractor
    extends POIOLE2TextExtractor
    Class to extract the text from old (Word 6 / Word 95) Word Documents. This should only be used on the older files, for most uses you should call WordExtractor which deals properly with HWPF.
    • Constructor Detail

      • Word6Extractor

        public Word6Extractor​(java.io.InputStream is)
                       throws java.io.IOException
        Create a new Word Extractor
        Parameters:
        is - InputStream containing the word file
        Throws:
        java.io.IOException
      • Word6Extractor

        public Word6Extractor​(POIFSFileSystem fs)
                       throws java.io.IOException
        Create a new Word Extractor
        Parameters:
        fs - POIFSFileSystem containing the word file
        Throws:
        java.io.IOException
      • Word6Extractor

        public Word6Extractor​(DirectoryNode dir)
                       throws java.io.IOException
        Throws:
        java.io.IOException
      • Word6Extractor

        public Word6Extractor​(HWPFOldDocument doc)
        Create a new Word Extractor
        Parameters:
        doc - The HWPFOldDocument to extract from
    • Method Detail

      • getParagraphText

        @Deprecated
        public java.lang.String[] getParagraphText()
        Deprecated.
        Get the text from the word file, as an array with one String per paragraph
      • getText

        public java.lang.String getText()
        Description copied from class: POITextExtractor
        Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.
        Specified by:
        getText in class POITextExtractor
        Returns:
        All the text from the document