ConstructingParser

class ConstructingParser(val input: Source, val preserveWS: Boolean) extends ConstructingHandler with ExternalSources with MarkupParser

An xml parser. parses XML and invokes callback methods of a MarkupHandler. Don't forget to call next.ch on a freshly instantiated parser in order to initialize it. If you get the parser from the object method, initialization is already done for you.

object parseFromURL {
 def main(args: Array[String]) {
   val url = args(0)
   val src = scala.io.Source.fromURL(url)
   val cpa = scala.xml.parsing.ConstructingParser.fromSource(src, false) // fromSource initializes automatically
   val doc = cpa.document()

   // let's see what it is
   val ppr = new scala.xml.PrettyPrinter(80, 5)
   val ele = doc.docElem
   println("finished parsing")
   val out = ppr.format(ele)
   println(out)
 }
}
Companion:
object
trait MarkupParserCommon
class Object
trait Matchable
class Any

Type members

Inherited types

Inherited from:
MarkupParser
Inherited from:
MarkupParser
Inherited from:
MarkupParser

Value members

Inherited methods

def appendText(pos: Int, ts: NodeBuffer, txt: String): Unit
Inherited from:
MarkupParser
def attListDecl(name: String, attList: List[AttrDecl]): Unit
Inherited from:
MarkupHandler
def attrDecl(): Unit
<! attlist := ATTLIST
Inherited from:
MarkupParser
def ch: Char
Inherited from:
MarkupParser
protected def ch_returning_nextch: Char
Inherited from:
MarkupParser
Inherited from:
TokenTests
Inherited from:
TokenTests
def comment(pos: Int, txt: String): NodeSeq
Inherited from:
ConstructingHandler
content1 ::=  '<' content1 | '&' charref ...
Inherited from:
MarkupParser
'<' content1 ::=  ...
Inherited from:
MarkupParser
[22]     prolog      ::= XMLDecl? Misc* (doctypedecl Misc*)?
[23]     XMLDecl     ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[24]     VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
[25]     Eq          ::= S? '=' S?
[26]     VersionNum  ::= '1.0'
[27]     Misc        ::= Comment | PI | S
Inherited from:
MarkupParser
def elem(pos: Int, pre: String, label: String, attrs: MetaData, pscope: NamespaceBinding, empty: Boolean, nodes: NodeSeq): NodeSeq
Inherited from:
ConstructingHandler
def elemDecl(n: String, cmstr: String): Unit
Inherited from:
MarkupHandler
def elemEnd(pos: Int, pre: String, label: String): Unit

callback method invoked by MarkupParser after end-tag of element.

callback method invoked by MarkupParser after end-tag of element.

Value parameters:
label

the local name

pos

the position in the source file

pre

the prefix

Inherited from:
MarkupHandler
def elemStart(pos: Int, pre: String, label: String, attrs: MetaData, scope: NamespaceBinding): Unit

callback method invoked by MarkupParser after start-tag of element.

callback method invoked by MarkupParser after start-tag of element.

Value parameters:
attrs

the attributes (metadata)

label

the local name

pos

the position in the sourcefile

pre

the prefix

Inherited from:
MarkupHandler
Inherited from:
MarkupParser
'<' element ::= xmlTag1 '>'  { xmlExpr | '{' simpleExpr '}' } ETag
             | xmlTag1 '/' '>'
Inherited from:
MarkupParser

<! element := ELEMENT

<! element := ELEMENT

Inherited from:
MarkupParser
def endDTD(n: String): Unit
Inherited from:
MarkupHandler
<! element := ELEMENT
Inherited from:
MarkupParser
def entityRef(pos: Int, n: String): NodeSeq
Inherited from:
ConstructingHandler
Inherited from:
MarkupParser
protected def errorAndResult[T](msg: String, x: T): T
Inherited from:
MarkupParserCommon
def errorNoEnd(tag: String): Nothing
Inherited from:
MarkupParser
Inherited from:
MarkupParser
externalID ::= SYSTEM S syslit
               PUBLIC S pubid S syslit
Inherited from:
MarkupParser
def externalSource(systemId: String): Source
Inherited from:
ExternalSources

As the current code requires you to call nextch once manually after construction, this method formalizes that suboptimal reality.

As the current code requires you to call nextch once manually after construction, this method formalizes that suboptimal reality.

Inherited from:
MarkupParser

"rec-xml/#ExtSubset" pe references may not occur within markup declarations

"rec-xml/#ExtSubset" pe references may not occur within markup declarations

Inherited from:
MarkupParser

These are 99% sure to be redundant but refactoring on the safe side.

These are 99% sure to be redundant but refactoring on the safe side.

Inherited from:
TokenTests
Inherited from:
TokenTests

See [5] of XML 1.0 specification.

Name ::= ( Letter | '_' ) (NameChar)*

See [5] of XML 1.0 specification.

Inherited from:
TokenTests

See [4] and [4a] of Appendix B of XML 1.0 specification.

NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | #xB7
           | CombiningChar | Extender

See [4] and [4a] of Appendix B of XML 1.0 specification.

Inherited from:
TokenTests

where Letter means in one of the Unicode general categories { Ll, Lu, Lo, Lt, Nl }.

NameStart ::= ( Letter | '_' | ':' )

where Letter means in one of the Unicode general categories { Ll, Lu, Lo, Lt, Nl }.

We do not allow a name to start with :. See [4] and Appendix B of XML 1.0 specification

Inherited from:
TokenTests
Inherited from:
TokenTests
final def isSpace(cs: Seq[Char]): Boolean
(#x20 | #x9 | #xD | #xA)+
Inherited from:
TokenTests
final def isSpace(ch: Char): Boolean
(#x20 | #x9 | #xD | #xA)
Inherited from:
TokenTests
def isValidIANAEncoding(ianaEncoding: Seq[Char]): Boolean

Returns true if the encoding name is a valid IANA encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an IANA encoding name.

Returns true if the encoding name is a valid IANA encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an IANA encoding name.

Value parameters:
ianaEncoding

The IANA encoding name.

Inherited from:
TokenTests
Inherited from:
MarkupHandler
Inherited from:
MarkupParser
def markupDecl1(): Matchable
Inherited from:
MarkupParser
def mkProcInstr(position: Int, name: String, text: String): ElementType
Inherited from:
MarkupParser
def nextch(): Unit

this method tells ch to get the next character when next called

this method tells ch to get the next character when next called

Inherited from:
MarkupParser
'N' notationDecl ::= "OTATION"
Inherited from:
MarkupParser
def notationDecl(notat: String, extID: ExternalID): Unit
Inherited from:
MarkupHandler
Inherited from:
MarkupHandler
def parseDTD(): Unit

parses document type declaration and assigns it to instance variable dtd.

parses document type declaration and assigns it to instance variable dtd.

<! parseDTD ::= DOCTYPE name ... >
Inherited from:
MarkupParser
Inherited from:
MarkupHandler
def peReference(name: String): Unit
Inherited from:
MarkupHandler
def pop(): Unit
Inherited from:
MarkupParser
def procInstr(pos: Int, target: String, txt: String): NodeSeq
Inherited from:
ConstructingHandler
<? prolog ::= xml S?
// this is a bit more lenient than necessary...
Inherited from:
MarkupParser
[12]       PubidLiteral ::=        '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"
Inherited from:
MarkupParser
def push(entityName: String): Unit
Inherited from:
MarkupParser
def pushExternal(systemId: String): Unit
Inherited from:
MarkupParser
protected def putChar(c: Char): StringBuilder

append Unicode character to name buffer

append Unicode character to name buffer

Inherited from:
MarkupParser
def replacementText(entityName: String): Source
Inherited from:
MarkupHandler
Inherited from:
MarkupParser
def reportSyntaxError(pos: Int, str: String): Unit
Inherited from:
MarkupParser
Inherited from:
MarkupParser
def returning[T](x: T)(f: T => Unit): T

Apply a function and return the passed value

Apply a function and return the passed value

Inherited from:
MarkupParserCommon
def saving[A, B](getter: A, setter: A => Unit)(body: => B): B

Execute body with a variable saved and restored after execution

Execute body with a variable saved and restored after execution

Inherited from:
MarkupParserCommon

attribute value, terminated by either ' or ". value may not contain <.

attribute value, terminated by either ' or ". value may not contain <.

     AttValue     ::= `'` { _ } `'`
                    | `"` { _ } `"`
Inherited from:
MarkupParser
def text(pos: Int, txt: String): NodeSeq
Inherited from:
ConstructingHandler

prolog, but without standalone

prolog, but without standalone

Inherited from:
MarkupParser
def truncatedError(msg: String): Nothing
Inherited from:
MarkupParser
def unparsedEntityDecl(name: String, extID: ExternalID, notat: String): Unit
Inherited from:
MarkupHandler
protected def unreachable: Nothing
Inherited from:
MarkupParserCommon
Inherited from:
MarkupParserCommon

attribute value, terminated by either ' or ". value may not contain <.

attribute value, terminated by either ' or ". value may not contain <.

Value parameters:
endCh

either ' or "

Inherited from:
MarkupParserCommon

parse attribute and create namespace scope, metadata

parse attribute and create namespace scope, metadata

[41] Attributes    ::= { S Name Eq AttValue }
Inherited from:
MarkupParser
'<! CharData ::= [CDATA[ ( {char} - {char}"]]>"{char} ) ']]>'

see [15]
Inherited from:
MarkupParser
Inherited from:
MarkupParserCommon
Inherited from:
MarkupParserCommon
def xCharRef(ch: () => Char, nextch: () => Unit): String

CharRef ::= "&#" '0'..'9' {'0'..'9'} ";" | "&#x" '0'..'9'|'A'..'F'|'a'..'f' { hexdigit } ";"

CharRef ::= "&#" '0'..'9' {'0'..'9'} ";" | "&#x" '0'..'9'|'A'..'F'|'a'..'f' { hexdigit } ";"

see [66]

Inherited from:
MarkupParserCommon
Comment ::= ''

see [15]
Inherited from:
MarkupParser
def xEQ(): Unit

scan [S] '=' [S]

scan [S] '=' [S]

Inherited from:
MarkupParserCommon
def xEndTag(startName: String): Unit

[42] '<' xmlEndTag ::= '<' '/' Name S? '>'

[42] '<' xmlEndTag ::= '<' '/' Name S? '>'

Inherited from:
MarkupParserCommon

entity value, terminated by either ' or ". value may not contain <.

entity value, terminated by either ' or ". value may not contain <.

     AttValue     ::= `'` { _  } `'`
                    | `"` { _ } `"`
Inherited from:
MarkupParser
def xHandleError(that: Char, msg: String): Unit
Inherited from:
MarkupParser

actually, Name ::= (Letter | '_' | ':') (NameChar)* but starting with ':' cannot happen Name ::= (Letter | '_') (NameChar)*

actually, Name ::= (Letter | '_' | ':') (NameChar)* but starting with ':' cannot happen Name ::= (Letter | '_') (NameChar)*

see [5] of XML 1.0 specification

pre-condition: ch != ':' // assured by definition of XMLSTART token post-condition: name does neither start, nor end in ':'

Inherited from:
MarkupParserCommon

'?' {Char})]'?>'

'?' {Char})]'?>'

see [15]

Inherited from:
MarkupParserCommon
def xSpace(): Unit

scan [3] S ::= (#x20 | #x9 | #xD | #xA)+

scan [3] S ::= (#x20 | #x9 | #xD | #xA)+

Inherited from:
MarkupParserCommon

skip optional space S?

skip optional space S?

Inherited from:
MarkupParserCommon
protected def xTag(pscope: NamespaceType): (String, AttributesType)

parse a start or empty tag. [40] STag ::= '<' Name { S Attribute } [S] [44] EmptyElemTag ::= '<' Name { S Attribute } [S]

parse a start or empty tag. [40] STag ::= '<' Name { S Attribute } [S] [44] EmptyElemTag ::= '<' Name { S Attribute } [S]

Inherited from:
MarkupParserCommon
protected def xTakeUntil[T](handler: (PositionType, String) => T, positioner: () => PositionType, until: String): T

Take characters from input stream until given String "until" is seen. Once seen, the accumulated characters are passed along with the current Position to the supplied handler function.

Take characters from input stream until given String "until" is seen. Once seen, the accumulated characters are passed along with the current Position to the supplied handler function.

Inherited from:
MarkupParserCommon
def xToken(that: Seq[Char]): Unit
Inherited from:
MarkupParserCommon
def xToken(that: Char): Unit
Inherited from:
MarkupParserCommon
<? prolog ::= xml S ... ?>
Inherited from:
MarkupParser

Concrete fields

Inherited fields

protected val cbuf: StringBuilder

character buffer, for names

character buffer, for names

Inherited from:
MarkupParser
protected var curInput: Source
Inherited from:
MarkupParser
Inherited from:
MarkupHandler
protected var doc: Document
Inherited from:
MarkupParser
var dtd: DTD
Inherited from:
MarkupParser
Inherited from:
MarkupHandler
Inherited from:
MarkupParser

stack of inputs

stack of inputs

Inherited from:
MarkupParser

returns true is this markup handler is validating

returns true is this markup handler is validating

Inherited from:
MarkupHandler
Inherited from:
MarkupParser

holds the next character

holds the next character

Inherited from:
MarkupParser
var pos: Int

holds the position in the source file

holds the position in the source file

Inherited from:
MarkupParser
Inherited from:
MarkupParser
var tmppos: Int

holds temporary values of pos

holds temporary values of pos

Inherited from:
MarkupParser