Class/Object

eu.cdevreeze.yaidom.parse

DocumentParserUsingSax

Related Docs: object DocumentParserUsingSax | package parse

Permalink

final class DocumentParserUsingSax extends AbstractDocumentParser

SAX-based Document parser.

Typical non-trivial creation is as follows, assuming a trait MyEntityResolver, which extends EntityResolver, and a trait MyErrorHandler, which extends ErrorHandler:

val spf = SAXParserFactory.newInstance().makeNamespaceAndPrefixAware

val parser = DocumentParserUsingSax.newInstance(
  spf,
  () => new DefaultElemProducingSaxHandler with MyEntityResolver with MyErrorHandler
)

If we want the SAXParserFactory to be a validating one, using an XML Schema, we could obtain the SAXParserFactory as follows:

val schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
val schemaSource = new StreamSource(new File(pathToSchema))
val schema = schemaFactory.newSchema(schemaSource)

val spf = {
  val result = SAXParserFactory.newInstance().makeNamespaceAndPrefixAware
  result.setSchema(schema)
  result
}

A custom EntityResolver could be used to retrieve DTDs locally, or even to suppress DTD resolution. The latter can be coded as follows (see http://stuartsierra.com/2008/05/08/stop-your-java-sax-parser-from-downloading-dtds), risking some loss of information:

trait MyEntityResolver extends EntityResolver {
  override def resolveEntity(publicId: String, systemId: String): InputSource = {
    // This dirty hack may not work on IBM JVMs
    new InputSource(new java.io.StringReader(""))
  }
}

For completeness, a custom ErrorHandler trait that simply prints parse exceptions to standard output:

trait MyErrorHandler extends ErrorHandler {
  override def warning(exc: SAXParseException): Unit = { println(exc) }
  override def error(exc: SAXParseException): Unit = { println(exc) }
  override def fatalError(exc: SAXParseException): Unit = { println(exc) }
}

It is even possible to parse HTML (including very poor HTML) into well-formed Documents by using a SAXParserFactory from the TagSoup library. For example:

val parser = DocumentParserUsingSax.newInstance(new org.ccil.cowan.tagsoup.jaxp.SAXFactoryImpl)

If more flexibility is needed in configuring the DocumentParser than offered by this class, consider writing a wrapper DocumentParser which wraps a DocumentParserUsingSax, but adapts the parse method. This would make it possible to set additional properties on the XML Reader, for example.

As can be seen above, parsing is based on the JAXP SAXParserFactory instead of the SAX 2.0 XMLReaderFactory.

A DocumentParserUsingSax instance can be re-used multiple times, from the same thread. If the SAXParserFactory is thread-safe, it can even be re-used from multiple threads. Typically a SAXParserFactory cannot be trusted to be thread-safe, however. In a web application, one (safe) way to deal with that is to use one SAXParserFactory instance per request.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DocumentParserUsingSax
  2. AbstractDocumentParser
  3. DocumentParser
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DocumentParserUsingSax(parserFactory: SAXParserFactory, parserCreator: (SAXParserFactory) ⇒ SAXParser, handlerCreator: () ⇒ ElemProducingSaxHandler)

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. val handlerCreator: () ⇒ ElemProducingSaxHandler

    Permalink
  11. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  12. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  13. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  15. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  16. def parse(inputSource: InputSource): Document

    Permalink

    Parses the input source into a yaidom Document.

    Parses the input source into a yaidom Document. Closes the input stream or reader afterwards.

    If the created DefaultHandler is a LexicalHandler, this LexicalHandler is registered. In practice all SAX parsers should support LexicalHandlers.

    Definition Classes
    DocumentParserUsingSaxDocumentParser
  17. final def parse(file: File): Document

    Permalink

    Parses the content of the given File into a eu.cdevreeze.yaidom.simple.Document.

    Parses the content of the given File into a eu.cdevreeze.yaidom.simple.Document.

    Definition Classes
    AbstractDocumentParserDocumentParser
  18. final def parse(uri: URI): Document

    Permalink

    Parses the content of the given URI into a eu.cdevreeze.yaidom.simple.Document.

    Parses the content of the given URI into a eu.cdevreeze.yaidom.simple.Document.

    Definition Classes
    AbstractDocumentParserDocumentParser
  19. final def parse(inputStream: InputStream): Document

    Permalink

    Parses the content of the given input stream into a eu.cdevreeze.yaidom.simple.Document.

    Parses the content of the given input stream into a eu.cdevreeze.yaidom.simple.Document.

    Definition Classes
    AbstractDocumentParserDocumentParser
  20. val parserCreator: (SAXParserFactory) ⇒ SAXParser

    Permalink
  21. val parserFactory: SAXParserFactory

    Permalink
  22. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  23. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  24. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AbstractDocumentParser

Inherited from DocumentParser

Inherited from AnyRef

Inherited from Any

Ungrouped