Class StreamingXMLReader
- java.lang.Object
-
- com.thirdpartylabs.xmlscalpel.io.reader.StreamingXMLReader
-
public class StreamingXMLReader extends java.lang.ObjectStreaming XML file reader that uses theWoodstoxstream reader to extract top level XML nodes along with metadata describing their location in the XML file, and send them to anXMLStreamProcessor.Using the streaming reader allows large files to be processed without significant overhead.
-
-
Constructor Summary
Constructors Constructor Description StreamingXMLReader()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.StringgetCharacterEncodingScheme()Returns the character encoding declared on the xml declaration Returns null if none was declaredjava.util.Map<java.lang.String,java.lang.String>getDocumentElementAttributeNamespaces()A map containing the namespace prefix to URI pairs from the document elementjava.util.Map<java.lang.String,java.lang.String>getDocumentElementAttributes()A map containing the attribute name-value pairs from the document elementjava.lang.StringgetDocumentElementTagName()The local name of the document element tagorg.w3c.dom.DocumentgetEmptyDocument()org.w3c.dom.DocumentgetEmptyDocument(java.io.File file)java.lang.StringgetEncoding()Return input encoding if known or null if unknown.OuterDocumentgetOuterDocument()OuterDocumentgetOuterDocument(java.io.File file)java.lang.StringgetPrefix()Returns the prefix of the current event or null if the event does not have a prefixjava.lang.StringgetVersion()Get the xml version declared on the xml declaration Returns null if none was declaredvoidreadFile(java.io.File file, XMLStreamProcessor processor)Read an XML file using theWoodstoxstreaming API and supply theXMLStreamProcessorwithFragmentobjects.voidreadFile(java.io.File file, XMLStreamProcessor processor, java.util.List<java.lang.String> targetPaths)Read an XML file using theWoodstoxstreaming API and supply theXMLStreamProcessorwithFragmentobjects.
-
-
-
Method Detail
-
readFile
public void readFile(java.io.File file, XMLStreamProcessor processor) throws java.io.IOException, javax.xml.stream.XMLStreamExceptionRead an XML file using theWoodstoxstreaming API and supply theXMLStreamProcessorwithFragmentobjects.All (and only) top level elements are returned. For example, given an XML file with a structure like
<feed> <product></product> <product></product> <product></product> </feed>All
productnodes will be returned.Fragmentobjects wrap the dom node as aDocumentFragmentand anXMLByteLocationobject that describes the node's location in the XML file. This allows efficient retrieval of the nodes later using theRandomAccessXMLReader- Parameters:
file- The XML file to processprocessor-XMLStreamProcessorinstance- Throws:
java.io.FileNotFoundExceptionjavax.xml.stream.XMLStreamExceptionjava.io.IOException
-
readFile
public void readFile(java.io.File file, XMLStreamProcessor processor, java.util.List<java.lang.String> targetPaths) throws java.io.IOException, javax.xml.stream.XMLStreamExceptionRead an XML file using theWoodstoxstreaming API and supply theXMLStreamProcessorwithFragmentobjects. Specify aListof node paths to extract. Example:<xml> <Feed> <Category> <Name>Bolts</Name> <Product>Large</Product> <Product>Small</Product> <Services> <Service>Tightening</Service> <Service>Loosening</Service> </Services> </Category> <Category> <Name>Hammers</Name> <Product>Framing</Product> <Product>Dead Blow</Product> <Services> <Service>Banging</Service> </Services> </Category> </Feed>You can extract all
productandserviceelements in the same read operation by passing in these paths:
/feed/category/Product
/feed/category/Services/ServiceNamespace prefixes may be specified as they appear in the XML:
/aw:PurchaseOrders/aw:PurchaseOrder/aw:AddressPaths are absolute with respect to the document root, they will be normalized to always have a leading slash and never have a trailing slash. Overlapping paths are not supported, the least specific path will be used in such a case.
Fragmentobjects wrap the dom node as aDocumentFragmentand anXMLByteLocationobject that describes the node's location in the XML file. This allows efficient retrieval of the nodes later using theRandomAccessXMLReader- Parameters:
file- The XML file to processprocessor-XMLStreamProcessorinstancetargetPaths-Listof node paths to target for extraction- Throws:
java.io.FileNotFoundExceptionjavax.xml.stream.XMLStreamExceptionjava.io.IOException
-
getDocumentElementAttributes
public java.util.Map<java.lang.String,java.lang.String> getDocumentElementAttributes()
A map containing the attribute name-value pairs from the document element- Returns:
Map<String,String>
-
getDocumentElementAttributeNamespaces
public java.util.Map<java.lang.String,java.lang.String> getDocumentElementAttributeNamespaces()
A map containing the namespace prefix to URI pairs from the document element- Returns:
Map<String,String>
-
getDocumentElementTagName
public java.lang.String getDocumentElementTagName()
The local name of the document element tag- Returns:
- The local name of the document element tag
-
getPrefix
public java.lang.String getPrefix()
Returns the prefix of the current event or null if the event does not have a prefix- Returns:
- the prefix or null
-
getCharacterEncodingScheme
public java.lang.String getCharacterEncodingScheme()
Returns the character encoding declared on the xml declaration Returns null if none was declared- Returns:
- the encoding declared in the document or null
- See Also:
XMLStreamReader
-
getEncoding
public java.lang.String getEncoding()
Return input encoding if known or null if unknown.- Returns:
- the encoding of this instance or null
- See Also:
XMLStreamReader
-
getVersion
public java.lang.String getVersion()
Get the xml version declared on the xml declaration Returns null if none was declared- Returns:
- the XML version or null
- See Also:
XMLStreamReader
-
getEmptyDocument
public org.w3c.dom.Document getEmptyDocument(java.io.File file) throws java.io.FileNotFoundException, javax.xml.stream.XMLStreamException, XMLParseException- Parameters:
file- XML file to extract an empty document for- Returns:
Documentcontaining only the document element from the file provided- Throws:
java.io.FileNotFoundExceptionjavax.xml.stream.XMLStreamExceptionXMLParseException
-
getEmptyDocument
public org.w3c.dom.Document getEmptyDocument() throws XMLParseException- Returns:
Documentcontaining only the document element from the last file provided to this instance of StreamingXMLReader- Throws:
XMLParseException
-
getOuterDocument
public OuterDocument getOuterDocument(java.io.File file) throws java.lang.Exception
- Parameters:
file- XML file to parse into anOuterDocument- Returns:
OuterDocumentwrapper containing the emptyDocumentcontaining only the document element from the file provided- Throws:
java.lang.Exception
-
getOuterDocument
public OuterDocument getOuterDocument() throws java.lang.Exception
- Returns:
OuterDocumentwrapper containing the emptyDocumentcontaining only the document element from the last file provided to this instance ofStreamingXMLReader- Throws:
java.lang.Exception
-
-