Parses the input stream into a yaidom Document
.
Parses the input stream into a yaidom Document
. Closes the input stream afterwards.
If the created DefaultHandler
is a LexicalHandler
, this LexicalHandler
is registered. In practice all SAX parsers
should support LexicalHandlers.
Parses the content of the given File into a eu.cdevreeze.yaidom.Document.
Parses the content of the given File into a eu.cdevreeze.yaidom.Document.
Parses the content of the given URI into a eu.cdevreeze.yaidom.Document.
Parses the content of the given URI into a eu.cdevreeze.yaidom.Document.
SAX-based
Document
parser.Typical non-trivial creation is as follows, assuming a trait
MyEntityResolver
, which extendsEntityResolver
, and a traitMyErrorHandler
, which extendsErrorHandler
:If we want the
SAXParserFactory
to be a validating one, using an XML Schema, we could obtain theSAXParserFactory
as follows:A custom
EntityResolver
could be used to retrieve DTDs locally, or even to suppress DTD resolution. The latter can be coded as follows (see http://stuartsierra.com/2008/05/08/stop-your-java-sax-parser-from-downloading-dtds), risking some loss of information:For completeness, a custom
ErrorHandler
trait that simply prints parse exceptions to standard output:It is even possible to parse HTML (including very poor HTML) into well-formed Documents by using a
SAXParserFactory
from the TagSoup library. For example:If more flexibility is needed in configuring the
DocumentParser
than offered by this class, consider writing a wrapperDocumentParser
which wraps aDocumentParserUsingSax
, but adapts theparse
method. This would make it possible to set additional properties on the XML Reader, for example.As can be seen above, parsing is based on the JAXP
SAXParserFactory
instead of the SAX 2.0XMLReaderFactory
.A
DocumentParserUsingSax
instance can be re-used multiple times, from the same thread. If theSAXParserFactory
is thread-safe, it can even be re-used from multiple threads. Typically aSAXParserFactory
cannot be trusted to be thread-safe, however. In a web application, one (safe) way to deal with that is to use oneSAXParserFactory
instance per request.