MarkupParser

trait MarkupParser extends MarkupParserCommon with TokenTests

An XML parser.

Parses XML 1.0, invokes callback methods of a MarkupHandler and returns whatever the markup handler returns. Use ConstructingParser if you just want to parse XML to construct instances of scala.xml.Node.

While XML elements are returned, DTD declarations - if handled - are collected using side-effects.

trait MarkupParserCommon
class Object
trait Matchable
class Any

Value members

Abstract methods

def externalSource(systemLiteral: String): Source

Concrete methods

def appendText(pos: Int, ts: NodeBuffer, txt: String): Unit
def attrDecl(): Unit
<! attlist := ATTLIST
def ch: Char
protected def ch_returning_nextch: Char
content1 ::=  '<' content1 | '&' charref ...
'<' content1 ::=  ...
[22]     prolog      ::= XMLDecl? Misc* (doctypedecl Misc*)?
[23]     XMLDecl     ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[24]     VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
[25]     Eq          ::= S? '=' S?
[26]     VersionNum  ::= '1.0'
[27]     Misc        ::= Comment | PI | S
'<' element ::= xmlTag1 '>'  { xmlExpr | '{' simpleExpr '}' } ETag
             | xmlTag1 '/' '>'

<! element := ELEMENT

<! element := ELEMENT

<! element := ELEMENT
def errorNoEnd(tag: String): Nothing
externalID ::= SYSTEM S syslit
               PUBLIC S pubid S syslit

As the current code requires you to call nextch once manually after construction, this method formalizes that suboptimal reality.

As the current code requires you to call nextch once manually after construction, this method formalizes that suboptimal reality.

"rec-xml/#ExtSubset" pe references may not occur within markup declarations

"rec-xml/#ExtSubset" pe references may not occur within markup declarations

def markupDecl1(): Matchable
def mkProcInstr(position: Int, name: String, text: String): ElementType
def nextch(): Unit

this method tells ch to get the next character when next called

this method tells ch to get the next character when next called

'N' notationDecl ::= "OTATION"
def parseDTD(): Unit

parses document type declaration and assigns it to instance variable dtd.

parses document type declaration and assigns it to instance variable dtd.

<! parseDTD ::= DOCTYPE name ... >
def pop(): Unit
<? prolog ::= xml S?
// this is a bit more lenient than necessary...
[12]       PubidLiteral ::=        '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"
def push(entityName: String): Unit
def pushExternal(systemId: String): Unit
protected def putChar(c: Char): StringBuilder

append Unicode character to name buffer

append Unicode character to name buffer

def reportSyntaxError(pos: Int, str: String): Unit

attribute value, terminated by either ' or ". value may not contain <.

attribute value, terminated by either ' or ". value may not contain <.

     AttValue     ::= `'` { _ } `'`
                    | `"` { _ } `"`

prolog, but without standalone

prolog, but without standalone

def truncatedError(msg: String): Nothing

parse attribute and create namespace scope, metadata

parse attribute and create namespace scope, metadata

[41] Attributes    ::= { S Name Eq AttValue }
'<! CharData ::= [CDATA[ ( {char} - {char}"]]>"{char} ) ']]>'

see [15]
Comment ::= ''

see [15]

entity value, terminated by either ' or ". value may not contain <.

entity value, terminated by either ' or ". value may not contain <.

     AttValue     ::= `'` { _  } `'`
                    | `"` { _ } `"`
def xHandleError(that: Char, msg: String): Unit
<? prolog ::= xml S ... ?>

Inherited methods

Inherited from:
TokenTests
Inherited from:
TokenTests
protected def errorAndResult[T](msg: String, x: T): T
Inherited from:
MarkupParserCommon

These are 99% sure to be redundant but refactoring on the safe side.

These are 99% sure to be redundant but refactoring on the safe side.

Inherited from:
TokenTests
Inherited from:
TokenTests

See [5] of XML 1.0 specification.

Name ::= ( Letter | '_' ) (NameChar)*

See [5] of XML 1.0 specification.

Inherited from:
TokenTests

See [4] and [4a] of Appendix B of XML 1.0 specification.

NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | #xB7
           | CombiningChar | Extender

See [4] and [4a] of Appendix B of XML 1.0 specification.

Inherited from:
TokenTests

where Letter means in one of the Unicode general categories { Ll, Lu, Lo, Lt, Nl }.

NameStart ::= ( Letter | '_' | ':' )

where Letter means in one of the Unicode general categories { Ll, Lu, Lo, Lt, Nl }.

We do not allow a name to start with :. See [4] and Appendix B of XML 1.0 specification

Inherited from:
TokenTests
Inherited from:
TokenTests
final def isSpace(cs: Seq[Char]): Boolean
(#x20 | #x9 | #xD | #xA)+
Inherited from:
TokenTests
final def isSpace(ch: Char): Boolean
(#x20 | #x9 | #xD | #xA)
Inherited from:
TokenTests
def isValidIANAEncoding(ianaEncoding: Seq[Char]): Boolean

Returns true if the encoding name is a valid IANA encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an IANA encoding name.

Returns true if the encoding name is a valid IANA encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an IANA encoding name.

Value parameters:
ianaEncoding

The IANA encoding name.

Inherited from:
TokenTests
def returning[T](x: T)(f: T => Unit): T

Apply a function and return the passed value

Apply a function and return the passed value

Inherited from:
MarkupParserCommon
def saving[A, B](getter: A, setter: A => Unit)(body: => B): B

Execute body with a variable saved and restored after execution

Execute body with a variable saved and restored after execution

Inherited from:
MarkupParserCommon
protected def unreachable: Nothing
Inherited from:
MarkupParserCommon
Inherited from:
MarkupParserCommon

attribute value, terminated by either ' or ". value may not contain <.

attribute value, terminated by either ' or ". value may not contain <.

Value parameters:
endCh

either ' or "

Inherited from:
MarkupParserCommon
Inherited from:
MarkupParserCommon
Inherited from:
MarkupParserCommon
def xCharRef(ch: () => Char, nextch: () => Unit): String

CharRef ::= "&#" '0'..'9' {'0'..'9'} ";" | "&#x" '0'..'9'|'A'..'F'|'a'..'f' { hexdigit } ";"

CharRef ::= "&#" '0'..'9' {'0'..'9'} ";" | "&#x" '0'..'9'|'A'..'F'|'a'..'f' { hexdigit } ";"

see [66]

Inherited from:
MarkupParserCommon
def xEQ(): Unit

scan [S] '=' [S]

scan [S] '=' [S]

Inherited from:
MarkupParserCommon
def xEndTag(startName: String): Unit

[42] '<' xmlEndTag ::= '<' '/' Name S? '>'

[42] '<' xmlEndTag ::= '<' '/' Name S? '>'

Inherited from:
MarkupParserCommon

actually, Name ::= (Letter | '_' | ':') (NameChar)* but starting with ':' cannot happen Name ::= (Letter | '_') (NameChar)*

actually, Name ::= (Letter | '_' | ':') (NameChar)* but starting with ':' cannot happen Name ::= (Letter | '_') (NameChar)*

see [5] of XML 1.0 specification

pre-condition: ch != ':' // assured by definition of XMLSTART token post-condition: name does neither start, nor end in ':'

Inherited from:
MarkupParserCommon

'?' {Char})]'?>'

'?' {Char})]'?>'

see [15]

Inherited from:
MarkupParserCommon
def xSpace(): Unit

scan [3] S ::= (#x20 | #x9 | #xD | #xA)+

scan [3] S ::= (#x20 | #x9 | #xD | #xA)+

Inherited from:
MarkupParserCommon

skip optional space S?

skip optional space S?

Inherited from:
MarkupParserCommon
protected def xTag(pscope: NamespaceType): (String, AttributesType)

parse a start or empty tag. [40] STag ::= '<' Name { S Attribute } [S] [44] EmptyElemTag ::= '<' Name { S Attribute } [S]

parse a start or empty tag. [40] STag ::= '<' Name { S Attribute } [S] [44] EmptyElemTag ::= '<' Name { S Attribute } [S]

Inherited from:
MarkupParserCommon
protected def xTakeUntil[T](handler: (PositionType, String) => T, positioner: () => PositionType, until: String): T

Take characters from input stream until given String "until" is seen. Once seen, the accumulated characters are passed along with the current Position to the supplied handler function.

Take characters from input stream until given String "until" is seen. Once seen, the accumulated characters are passed along with the current Position to the supplied handler function.

Inherited from:
MarkupParserCommon
def xToken(that: Seq[Char]): Unit
Inherited from:
MarkupParserCommon
def xToken(that: Char): Unit
Inherited from:
MarkupParserCommon

Abstract fields

if true, does not remove surplus whitespace

if true, does not remove surplus whitespace

Concrete fields

protected val cbuf: StringBuilder

character buffer, for names

character buffer, for names

protected var curInput: Source
protected var doc: Document
var dtd: DTD

stack of inputs

stack of inputs

holds the next character

holds the next character

var pos: Int

holds the position in the source file

holds the position in the source file

var tmppos: Int

holds temporary values of pos

holds temporary values of pos