eu.cdevreeze.yaidom.parse

DocumentParserUsingSax

final class DocumentParserUsingSax extends AbstractDocumentParser

SAX-based Document parser.

Typical non-trivial creation is as follows, assuming a trait MyEntityResolver, which extends EntityResolver, and a trait MyErrorHandler, which extends ErrorHandler:

val spf = SAXParserFactory.newInstance
spf.setFeature("http://xml.org/sax/features/namespaces", true)
spf.setFeature("http://xml.org/sax/features/namespace-prefixes", true)

val parser = DocumentParserUsingSax.newInstance(
  spf,
  () => new DefaultElemProducingSaxHandler with MyEntityResolver with MyErrorHandler
)

If we want the SAXParserFactory to be a validating one, using an XML Schema, we could obtain the SAXParserFactory as follows:

val schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
val schemaSource = new StreamSource(new File(pathToSchema))
val schema = schemaFactory.newSchema(schemaSource)

val spf = {
  val result = SAXParserFactory.newInstance()
  result.setFeature("http://xml.org/sax/features/namespaces", true)
  result.setFeature("http://xml.org/sax/features/namespace-prefixes", true)
  result.setSchema(schema)
  result
}

A custom EntityResolver could be used to retrieve DTDs locally, or even to suppress DTD resolution. The latter can be coded as follows (see http://stuartsierra.com/2008/05/08/stop-your-java-sax-parser-from-downloading-dtds), risking some loss of information:

trait MyEntityResolver extends EntityResolver {
  override def resolveEntity(publicId: String, systemId: String): InputSource = {
    // This dirty hack may not work on IBM JVMs
    new InputSource(new java.io.StringReader(""))
  }
}

For completeness, a custom ErrorHandler trait that simply prints parse exceptions to standard output:

trait MyErrorHandler extends ErrorHandler {
  override def warning(exc: SAXParseException): Unit = { println(exc) }
  override def error(exc: SAXParseException): Unit = { println(exc) }
  override def fatalError(exc: SAXParseException): Unit = { println(exc) }
}

It is even possible to parse HTML (including very poor HTML) into well-formed Documents by using a SAXParserFactory from the TagSoup library. For example:

val parser = DocumentParserUsingSax.newInstance(new org.ccil.cowan.tagsoup.jaxp.SAXFactoryImpl)

If more flexibility is needed in configuring the DocumentParser than offered by this class, consider writing a wrapper DocumentParser which wraps a DocumentParserUsingSax, but adapts the parse method. This would make it possible to set additional properties on the XML Reader, for example.

As can be seen above, parsing is based on the JAXP SAXParserFactory instead of the SAX 2.0 XMLReaderFactory.

A DocumentParserUsingSax instance can be re-used multiple times, from the same thread. If the SAXParserFactory is thread-safe, it can even be re-used from multiple threads. Typically a SAXParserFactory cannot be trusted to be thread-safe, however. In a web application, one (safe) way to deal with that is to use one SAXParserFactory instance per request.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DocumentParserUsingSax
  2. AbstractDocumentParser
  3. DocumentParser
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DocumentParserUsingSax(parserFactory: SAXParserFactory, parserCreator: (SAXParserFactory) ⇒ SAXParser, handlerCreator: () ⇒ ElemProducingSaxHandler)

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  5. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  10. val handlerCreator: () ⇒ ElemProducingSaxHandler

  11. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  12. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  13. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  14. final def notify(): Unit

    Definition Classes
    AnyRef
  15. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  16. def parse(inputStream: InputStream): Document

    Parses the input stream into a yaidom Document.

    Parses the input stream into a yaidom Document. Closes the input stream afterwards.

    If the created DefaultHandler is a LexicalHandler, this LexicalHandler is registered. In practice all SAX parsers should support LexicalHandlers.

    Definition Classes
    DocumentParserUsingSaxDocumentParser
  17. final def parse(file: File): Document

    Parses the content of the given File into a eu.cdevreeze.yaidom.simple.Document.

    Parses the content of the given File into a eu.cdevreeze.yaidom.simple.Document.

    Definition Classes
    AbstractDocumentParserDocumentParser
  18. final def parse(uri: URI): Document

    Parses the content of the given URI into a eu.cdevreeze.yaidom.simple.Document.

    Parses the content of the given URI into a eu.cdevreeze.yaidom.simple.Document.

    Definition Classes
    AbstractDocumentParserDocumentParser
  19. val parserCreator: (SAXParserFactory) ⇒ SAXParser

  20. val parserFactory: SAXParserFactory

  21. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  22. def toString(): String

    Definition Classes
    AnyRef → Any
  23. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AbstractDocumentParser

Inherited from DocumentParser

Inherited from AnyRef

Inherited from Any

Ungrouped