Class PublisherTextExtractor

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public final class PublisherTextExtractor
    extends POIOLE2TextExtractor
    Extract text from HPBF Publisher files
    • Constructor Detail

      • PublisherTextExtractor

        public PublisherTextExtractor​(HPBFDocument doc)
      • PublisherTextExtractor

        public PublisherTextExtractor​(DirectoryNode dir)
                               throws java.io.IOException
        Throws:
        java.io.IOException
      • PublisherTextExtractor

        public PublisherTextExtractor​(POIFSFileSystem fs)
                               throws java.io.IOException
        Throws:
        java.io.IOException
      • PublisherTextExtractor

        public PublisherTextExtractor​(java.io.InputStream is)
                               throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • setHyperlinksByDefault

        public void setHyperlinksByDefault​(boolean hyperlinksByDefault)
        Should a call to getText() return hyperlinks inline with the text? Default is no
      • getText

        public java.lang.String getText()
        Description copied from class: POITextExtractor
        Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.
        Specified by:
        getText in class POITextExtractor
        Returns:
        All the text from the document
      • main

        public static void main​(java.lang.String[] args)
                         throws java.lang.Exception
        Throws:
        java.lang.Exception