java.lang.Object

com.thirdpartylabs.xmlscalpel.io.reader.StreamingXMLReader

public class StreamingXMLReader
extends java.lang.Object

Streaming XML file reader that uses the Woodstox stream reader to extract top level XML nodes along with metadata describing their location in the XML file, and send them to an XMLStreamProcessor.

Using the streaming reader allows large files to be processed without significant overhead.

Constructor Summary

Constructors

Constructor Description

StreamingXMLReader()

Method Summary

Modifier and Type	Method	Description
`java.lang.String`	`getCharacterEncodingScheme()`	Returns the character encoding declared on the xml declaration Returns null if none was declared
`java.util.Map<java.lang.String,java.lang.String>`	`getDocumentElementAttributeNamespaces()`	A map containing the namespace prefix to URI pairs from the document element
`java.util.Map<java.lang.String,java.lang.String>`	`getDocumentElementAttributes()`	A map containing the attribute name-value pairs from the document element
`java.lang.String`	`getDocumentElementTagName()`	The local name of the document element tag
`org.w3c.dom.Document`	`getEmptyDocument()`
`org.w3c.dom.Document`	`getEmptyDocument(java.io.File file)`
`java.lang.String`	`getEncoding()`	Return input encoding if known or null if unknown.
`OuterDocument`	`getOuterDocument()`
`OuterDocument`	`getOuterDocument(java.io.File file)`
`java.lang.String`	`getPrefix()`	Returns the prefix of the current event or null if the event does not have a prefix
`java.lang.String`	`getVersion()`	Get the xml version declared on the xml declaration Returns null if none was declared
`void`	`readFile(java.io.File file, XMLStreamProcessor processor)`	Read an XML file using the `Woodstox` streaming API and supply the `XMLStreamProcessor` with `Fragment` objects.
`void`	`readFile(java.io.File file, XMLStreamProcessor processor, java.util.List<java.lang.String> targetPaths)`	Read an XML file using the `Woodstox` streaming API and supply the `XMLStreamProcessor` with `Fragment` objects.

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- StreamingXMLReader
  
  public StreamingXMLReader() throws javax.xml.transform.TransformerConfigurationException
  
  Throws:
  
  javax.xml.transform.TransformerConfigurationException
Method Details
- readFile
  
  public void readFile(java.io.File file, XMLStreamProcessor processor, java.util.List<java.lang.String> targetPaths) throws java.io.FileNotFoundException, javax.xml.stream.XMLStreamException, javax.xml.transform.TransformerException
  Read an XML file using the Woodstox streaming API and supply the XMLStreamProcessor with Fragment objects. Specify a List of node paths to extract. Example:
  <xml> <Feed> <Category> <Name>Bolts</Name> <Product>Large</Product> <Product>Small</Product> <Services> <Service>Tightening</Service> <Service>Loosening</Service> </Services> </Category> <Category> <Name>Hammers</Name> <Product>Framing</Product> <Product>Dead Blow</Product> <Services> <Service>Banging</Service> </Services> </Category> </Feed>
  
  You can extract all product and service elements in the same read operation by passing in these paths:
  /feed/category/Product
  /feed/category/Services/Service
  Namespace prefixes may be specified as they appear in the XML: /aw:PurchaseOrders/aw:PurchaseOrder/aw:Address
  Paths are absolute with respect to the document root, they will be normalized to always have a leading slash and never have a trailing slash. Overlapping paths are not supported, the least specific path will be used in such a case.
  Fragment objects wrap the dom node as a DocumentFragment and an XMLByteLocation object that describes the node's location in the XML file. This allows efficient retrieval of the nodes later using the RandomAccessXMLReader
  Parameters:
  
  file - The XML file to process
  
  processor - XMLStreamProcessor instance
  
  targetPaths - List of node paths to target for extraction
  
  Throws:
  
  java.io.FileNotFoundException
  
  javax.xml.stream.XMLStreamException
  
  javax.xml.transform.TransformerException
- readFile
  
  public void readFile(java.io.File file, XMLStreamProcessor processor) throws java.io.FileNotFoundException, javax.xml.stream.XMLStreamException, javax.xml.transform.TransformerException
  Read an XML file using the Woodstox streaming API and supply the XMLStreamProcessor with Fragment objects.
  All (and only) top level elements are returned. For example, given an XML file with a structure like
  <feed> <product></product> <product></product> <product></product> </feed>
  
  All product nodes will be returned.
  Fragment objects wrap the dom node as a DocumentFragment and an XMLByteLocation object that describes the node's location in the XML file. This allows efficient retrieval of the nodes later using the RandomAccessXMLReader
  Parameters:
  
  file - The XML file to process
  
  processor - XMLStreamProcessor instance
  
  Throws:
  
  java.io.FileNotFoundException
  
  javax.xml.stream.XMLStreamException
  
  javax.xml.transform.TransformerException
- getDocumentElementAttributes
  
  public java.util.Map<java.lang.String,java.lang.String> getDocumentElementAttributes()
  
  A map containing the attribute name-value pairs from the document element
  
  Returns:
  
  Map<String, String>
- getDocumentElementAttributeNamespaces
  
  public java.util.Map<java.lang.String,java.lang.String> getDocumentElementAttributeNamespaces()
  
  A map containing the namespace prefix to URI pairs from the document element
  
  Returns:
  
  Map<String, String>
- getDocumentElementTagName
  
  public java.lang.String getDocumentElementTagName()
  
  The local name of the document element tag
  
  Returns:
  
  The local name of the document element tag
- getPrefix
  
  public java.lang.String getPrefix()
  
  Returns the prefix of the current event or null if the event does not have a prefix
  
  Returns:
  
  the prefix or null
- getCharacterEncodingScheme
  
  public java.lang.String getCharacterEncodingScheme()
  
  Returns the character encoding declared on the xml declaration Returns null if none was declared
  
  Returns:
  
  the encoding declared in the document or null
  
  See Also:
  
  XMLStreamReader
- getEncoding
  
  public java.lang.String getEncoding()
  
  Return input encoding if known or null if unknown.
  
  Returns:
  
  the encoding of this instance or null
  
  See Also:
  
  XMLStreamReader
- getVersion
  
  public java.lang.String getVersion()
  
  Get the xml version declared on the xml declaration Returns null if none was declared
  
  Returns:
  
  the XML version or null
  
  See Also:
  
  XMLStreamReader
- getEmptyDocument
  
  public org.w3c.dom.Document getEmptyDocument(java.io.File file) throws java.io.FileNotFoundException, javax.xml.stream.XMLStreamException, javax.xml.parsers.ParserConfigurationException, javax.management.modelmbean.XMLParseException
  
  Parameters:
  
  file - XML file to extract an empty document for
  
  Returns:
  
  Document containing only the document element from the file provided
  
  Throws:
  
  java.io.FileNotFoundException
  
  javax.xml.stream.XMLStreamException
  
  javax.xml.parsers.ParserConfigurationException
  
  javax.management.modelmbean.XMLParseException
- getEmptyDocument
  
  public org.w3c.dom.Document getEmptyDocument() throws javax.xml.parsers.ParserConfigurationException, javax.management.modelmbean.XMLParseException
  
  Returns:
  
  Document containing only the document element from the last file provided to this instance of StreamingXMLReader
  
  Throws:
  
  javax.xml.parsers.ParserConfigurationException
  
  javax.management.modelmbean.XMLParseException
- getOuterDocument
  
  public OuterDocument getOuterDocument(java.io.File file) throws java.lang.Exception
  
  Parameters:
  
  file - XML file to parse into an OuterDocument
  
  Returns:
  
  OuterDocument wrapper containing the empty Document containing only the document element from the file provided
  
  Throws:
  
  java.lang.Exception
- getOuterDocument
  
  public OuterDocument getOuterDocument() throws java.lang.Exception
  
  Returns:
  
  OuterDocument wrapper containing the empty Document containing only the document element from the last file provided to this instance of StreamingXMLReader
  
  Throws:
  
  java.lang.Exception

Class StreamingXMLReader

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

Method Details