Class StreamingXMLReader


  • public class StreamingXMLReader
    extends java.lang.Object
    Streaming XML file reader that uses the Woodstox stream reader to extract top level XML nodes along with metadata describing their location in the XML file, and send them to an XMLStreamProcessor.

    Using the streaming reader allows large files to be processed without significant overhead.

    • Constructor Detail

      • StreamingXMLReader

        public StreamingXMLReader()
                           throws javax.xml.parsers.ParserConfigurationException
        Throws:
        javax.xml.parsers.ParserConfigurationException
    • Method Detail

      • readFile

        public void readFile​(java.io.File file,
                             XMLStreamProcessor processor)
                      throws java.io.IOException,
                             javax.xml.stream.XMLStreamException
        Read an XML file using the Woodstox streaming API and supply the XMLStreamProcessor with Fragment objects.

        All (and only) top level elements are returned. For example, given an XML file with a structure like

         
         <feed>
          <product></product>
          <product></product>
          <product></product>
         </feed>
         
         

        All product nodes will be returned.

        Fragment objects wrap the dom node as a DocumentFragment and an XMLByteLocation object that describes the node's location in the XML file. This allows efficient retrieval of the nodes later using the RandomAccessXMLReader

        Parameters:
        file - The XML file to process
        processor - XMLStreamProcessor instance
        Throws:
        java.io.FileNotFoundException
        javax.xml.stream.XMLStreamException
        java.io.IOException
      • readFile

        public void readFile​(java.io.File file,
                             XMLStreamProcessor processor,
                             java.util.List<java.lang.String> targetPaths)
                      throws java.io.IOException,
                             javax.xml.stream.XMLStreamException
        Read an XML file using the Woodstox streaming API and supply the XMLStreamProcessor with Fragment objects. Specify a List of node paths to extract. Example:
         
         <xml>
         <Feed>
             <Category>
                 <Name>Bolts</Name>
                 <Product>Large</Product>
                 <Product>Small</Product>
                 <Services>
                     <Service>Tightening</Service>
                     <Service>Loosening</Service>
                 </Services>
             </Category>
             <Category>
                 <Name>Hammers</Name>
                 <Product>Framing</Product>
                 <Product>Dead Blow</Product>
                 <Services>
                     <Service>Banging</Service>
                 </Services>
             </Category>
         </Feed>
         
         

        You can extract all product and service elements in the same read operation by passing in these paths:
        /feed/category/Product
        /feed/category/Services/Service

        Namespace prefixes may be specified as they appear in the XML: /aw:PurchaseOrders/aw:PurchaseOrder/aw:Address

        Paths are absolute with respect to the document root, they will be normalized to always have a leading slash and never have a trailing slash. Overlapping paths are not supported, the least specific path will be used in such a case.

        Fragment objects wrap the dom node as a DocumentFragment and an XMLByteLocation object that describes the node's location in the XML file. This allows efficient retrieval of the nodes later using the RandomAccessXMLReader

        Parameters:
        file - The XML file to process
        processor - XMLStreamProcessor instance
        targetPaths - List of node paths to target for extraction
        Throws:
        java.io.FileNotFoundException
        javax.xml.stream.XMLStreamException
        java.io.IOException
      • getDocumentElementAttributes

        public java.util.Map<java.lang.String,​java.lang.String> getDocumentElementAttributes()
        A map containing the attribute name-value pairs from the document element
        Returns:
        Map<String, String>
      • getDocumentElementAttributeNamespaces

        public java.util.Map<java.lang.String,​java.lang.String> getDocumentElementAttributeNamespaces()
        A map containing the namespace prefix to URI pairs from the document element
        Returns:
        Map<String, String>
      • getDocumentElementTagName

        public java.lang.String getDocumentElementTagName()
        The local name of the document element tag
        Returns:
        The local name of the document element tag
      • getPrefix

        public java.lang.String getPrefix()
        Returns the prefix of the current event or null if the event does not have a prefix
        Returns:
        the prefix or null
      • getCharacterEncodingScheme

        public java.lang.String getCharacterEncodingScheme()
        Returns the character encoding declared on the xml declaration Returns null if none was declared
        Returns:
        the encoding declared in the document or null
        See Also:
        XMLStreamReader
      • getEncoding

        public java.lang.String getEncoding()
        Return input encoding if known or null if unknown.
        Returns:
        the encoding of this instance or null
        See Also:
        XMLStreamReader
      • getVersion

        public java.lang.String getVersion()
        Get the xml version declared on the xml declaration Returns null if none was declared
        Returns:
        the XML version or null
        See Also:
        XMLStreamReader
      • getEmptyDocument

        public org.w3c.dom.Document getEmptyDocument​(java.io.File file)
                                              throws java.io.FileNotFoundException,
                                                     javax.xml.stream.XMLStreamException,
                                                     XMLParseException
        Parameters:
        file - XML file to extract an empty document for
        Returns:
        Document containing only the document element from the file provided
        Throws:
        java.io.FileNotFoundException
        javax.xml.stream.XMLStreamException
        XMLParseException
      • getEmptyDocument

        public org.w3c.dom.Document getEmptyDocument()
                                              throws XMLParseException
        Returns:
        Document containing only the document element from the last file provided to this instance of StreamingXMLReader
        Throws:
        XMLParseException
      • getOuterDocument

        public OuterDocument getOuterDocument​(java.io.File file)
                                       throws java.lang.Exception
        Parameters:
        file - XML file to parse into an OuterDocument
        Returns:
        OuterDocument wrapper containing the empty Document containing only the document element from the file provided
        Throws:
        java.lang.Exception
      • getOuterDocument

        public OuterDocument getOuterDocument()
                                       throws java.lang.Exception
        Returns:
        OuterDocument wrapper containing the empty Document containing only the document element from the last file provided to this instance of StreamingXMLReader
        Throws:
        java.lang.Exception