Class ParsingEmbeddedDocumentExtractor

  • All Implemented Interfaces:
    EmbeddedDocumentExtractor

    public class ParsingEmbeddedDocumentExtractor
    extends java.lang.Object
    implements EmbeddedDocumentExtractor
    Helper class for parsers of package archives or other compound document formats that support embedded or attached component documents.
    Since:
    Apache Tika 0.8
    • Constructor Detail

      • ParsingEmbeddedDocumentExtractor

        public ParsingEmbeddedDocumentExtractor​(ParseContext context)
    • Method Detail

      • parseEmbedded

        public void parseEmbedded​(java.io.InputStream stream,
                                  org.xml.sax.ContentHandler handler,
                                  Metadata metadata,
                                  boolean outputHtml)
                           throws org.xml.sax.SAXException,
                                  java.io.IOException
        Description copied from interface: EmbeddedDocumentExtractor
        Processes the supplied embedded resource, calling the delegating parser with the appropriate details.
        Specified by:
        parseEmbedded in interface EmbeddedDocumentExtractor
        Parameters:
        stream - The embedded resource
        handler - The handler to use
        metadata - The metadata for the embedded resource
        outputHtml - Should we output HTML for this resource, or has the parser already done so?
        Throws:
        org.xml.sax.SAXException
        java.io.IOException