Class XWPFWordExtractor

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public class XWPFWordExtractor
    extends POIXMLTextExtractor
    Helper class to extract text from an OOXML Word file
    • Field Detail

      • SUPPORTED_TYPES

        public static final XWPFRelation[] SUPPORTED_TYPES
    • Method Detail

      • main

        public static void main​(java.lang.String[] args)
                         throws java.lang.Exception
        Throws:
        java.lang.Exception
      • setFetchHyperlinks

        public void setFetchHyperlinks​(boolean fetch)
        Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents
      • setConcatenatePhoneticRuns

        public void setConcatenatePhoneticRuns​(boolean concatenatePhoneticRuns)
        Should we concatenate phonetic runs in extraction. Default is true
        Parameters:
        concatenatePhoneticRuns -
      • getText

        public java.lang.String getText()
        Description copied from class: POITextExtractor
        Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.
        Specified by:
        getText in class POITextExtractor
        Returns:
        All the text from the document
      • appendBodyElementText

        public void appendBodyElementText​(java.lang.StringBuilder text,
                                          IBodyElement e)
      • appendParagraphText

        public void appendParagraphText​(java.lang.StringBuilder text,
                                        XWPFParagraph paragraph)