MarkupParser

trait MarkupParser extends MarkupParserCommon with TokenTests

An XML parser.

Parses XML 1.0, invokes callback methods of a MarkupHandler and returns whatever the markup handler returns. Use ConstructingParser if you just want to parse XML to construct instances of scala.xml.Node.

While XML elements are returned, DTD declarations - if handled - are collected using side-effects.

trait MarkupParserCommon
class Object
trait Matchable
class Any

Type members

Value members

Abstract methods

def externalSource(systemLiteral: String): Source

Concrete methods

def appendText(pos: Int, ts: NodeBuffer, txt: String): Unit
def attrDecl(): Unit
<! attlist := ATTLIST
def ch: Char
protected def ch_returning_nextch: Char
content1 ::=  '<' content1 | '&' charref ...
def content1(pscope: NamespaceBinding, ts: NodeBuffer): Unit
'<' content1 ::=  ...
[22]     prolog      ::= XMLDecl? Misc* (doctypedecl Misc*)?
[23]     XMLDecl     ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[24]     VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
[25]     Eq          ::= S? '=' S?
[26]     VersionNum  ::= '1.0'
[27]     Misc        ::= Comment | PI | S
'<' element ::= xmlTag1 '>'  { xmlExpr | '{' simpleExpr '}' } ETag
             | xmlTag1 '/' '>'
def elementDecl(): Unit

<! element := ELEMENT

<! element := ELEMENT

def entityDecl(): Unit
<! element := ELEMENT
def eof: Boolean
def errorNoEnd(tag: String): Nothing
def extSubset(): Unit
externalID ::= SYSTEM S syslit
               PUBLIC S pubid S syslit

As the current code requires you to call nextch once manually after construction, this method formalizes that suboptimal reality.

As the current code requires you to call nextch once manually after construction, this method formalizes that suboptimal reality.

def intSubset(): Unit

"rec-xml/#ExtSubset" pe references may not occur within markup declarations

"rec-xml/#ExtSubset" pe references may not occur within markup declarations

def lookahead(): BufferedIterator[Char]
def markupDecl(): Unit
def markupDecl1(): Matchable
def mkAttributes(name: String, pscope: NamespaceBinding): AttributesType
def mkProcInstr(position: Int, name: String, text: String): ElementType
def nextch(): Unit

this method tells ch to get the next character when next called

this method tells ch to get the next character when next called

def notationDecl(): Unit
'N' notationDecl ::= "OTATION"
def parseDTD(): Unit

parses document type declaration and assigns it to instance variable dtd.

parses document type declaration and assigns it to instance variable dtd.

<! parseDTD ::= DOCTYPE name ... >
def pop(): Unit
def prolog(): (Option[String], Option[String], Option[Boolean])
<? prolog ::= xml S?
// this is a bit more lenient than necessary...
def pubidLiteral(): String
[12]       PubidLiteral ::=        '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"
def push(entityName: String): Unit
def pushExternal(systemId: String): Unit
protected def putChar(c: Char): StringBuilder

append Unicode character to name buffer

append Unicode character to name buffer

def reportSyntaxError(pos: Int, str: String): Unit
def reportSyntaxError(str: String): Unit
def reportValidationError(pos: Int, str: String): Unit
def systemLiteral(): String

attribute value, terminated by either ' or ". value may not contain <.

attribute value, terminated by either ' or ". value may not contain <.

     AttValue     ::= `'` { _ } `'`
                    | `"` { _ } `"`
def textDecl(): (Option[String], Option[String])

prolog, but without standalone

prolog, but without standalone

def truncatedError(msg: String): Nothing

parse attribute and create namespace scope, metadata

parse attribute and create namespace scope, metadata

[41] Attributes    ::= { S Name Eq AttValue }
'<! CharData ::= [CDATA[ ( {char} - {char}"]]>"{char} ) ']]>'

see [15]
Comment ::= ''

see [15]
def xEntityValue(): String

entity value, terminated by either ' or ". value may not contain <.

entity value, terminated by either ' or ". value may not contain <.

     AttValue     ::= `'` { _  } `'`
                    | `"` { _ } `"`
def xHandleError(that: Char, msg: String): Unit
<? prolog ::= xml S ... ?>

Inherited methods

def checkPubID(s: String): Boolean
Inherited from
TokenTests
def checkSysID(s: String): Boolean
Inherited from
TokenTests
protected def errorAndResult[T](msg: String, x: T): T
Inherited from
MarkupParserCommon
def isAlpha(c: Char): Boolean

These are 99% sure to be redundant but refactoring on the safe side.

These are 99% sure to be redundant but refactoring on the safe side.

Inherited from
TokenTests
def isAlphaDigit(c: Char): Boolean
Inherited from
TokenTests
def isName(s: String): Boolean

See [5] of XML 1.0 specification.

Name ::= ( Letter | '_' ) (NameChar)*

See [5] of XML 1.0 specification.

Inherited from
TokenTests
def isNameChar(ch: Char): Boolean

See [4] and [4a] of Appendix B of XML 1.0 specification.

NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | #xB7
           | CombiningChar | Extender

See [4] and [4a] of Appendix B of XML 1.0 specification.

Inherited from
TokenTests
def isNameStart(ch: Char): Boolean

where Letter means in one of the Unicode general categories { Ll, Lu, Lo, Lt, Nl }.

NameStart ::= ( Letter | '_' | ':' )

where Letter means in one of the Unicode general categories { Ll, Lu, Lo, Lt, Nl }.

We do not allow a name to start with :. See [4] and Appendix B of XML 1.0 specification

Inherited from
TokenTests
def isPubIDChar(ch: Char): Boolean
Inherited from
TokenTests
final def isSpace(cs: Seq[Char]): Boolean
(#x20 | #x9 | #xD | #xA)+
Inherited from
TokenTests
final def isSpace(ch: Char): Boolean
(#x20 | #x9 | #xD | #xA)
Inherited from
TokenTests
def isValidIANAEncoding(ianaEncoding: Seq[Char]): Boolean

Returns true if the encoding name is a valid IANA encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an IANA encoding name.

Returns true if the encoding name is a valid IANA encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an IANA encoding name.

Value Params
ianaEncoding

The IANA encoding name.

Inherited from
TokenTests
def returning[T](x: T)(f: T => Unit): T

Apply a function and return the passed value

Apply a function and return the passed value

Inherited from
MarkupParserCommon
def saving[A, B](getter: A, setter: A => Unit)(body: => B): B

Execute body with a variable saved and restored after execution

Execute body with a variable saved and restored after execution

Inherited from
MarkupParserCommon
protected def unreachable: Nothing
Inherited from
MarkupParserCommon
def xAttributeValue(): String
Inherited from
MarkupParserCommon
def xAttributeValue(endCh: Char): String

attribute value, terminated by either ' or ". value may not contain <.

attribute value, terminated by either ' or ". value may not contain <.

Value Params
endCh

either ' or "

Inherited from
MarkupParserCommon
def xCharRef: String
Inherited from
MarkupParserCommon
def xCharRef(it: Iterator[Char]): String
Inherited from
MarkupParserCommon
def xCharRef(ch: () => Char, nextch: () => Unit): String

CharRef ::= "&#" '0'..'9' {'0'..'9'} ";" | "&#x" '0'..'9'|'A'..'F'|'a'..'f' { hexdigit } ";"

CharRef ::= "&#" '0'..'9' {'0'..'9'} ";" | "&#x" '0'..'9'|'A'..'F'|'a'..'f' { hexdigit } ";"

see [66]

Inherited from
MarkupParserCommon
def xEQ(): Unit

scan [S] '=' [S]

scan [S] '=' [S]

Inherited from
MarkupParserCommon
def xEndTag(startName: String): Unit

[42] '<' xmlEndTag ::= '<' '/' Name S? '>'

[42] '<' xmlEndTag ::= '<' '/' Name S? '>'

Inherited from
MarkupParserCommon
def xName: String

actually, Name ::= (Letter | '_' | ':') (NameChar)* but starting with ':' cannot happen Name ::= (Letter | '_') (NameChar)*

actually, Name ::= (Letter | '_' | ':') (NameChar)* but starting with ':' cannot happen Name ::= (Letter | '_') (NameChar)*

see [5] of XML 1.0 specification

pre-condition: ch != ':' // assured by definition of XMLSTART token post-condition: name does neither start, nor end in ':'

Inherited from
MarkupParserCommon

'<?' ProcInstr ::= Name [S ({Char} - ({Char}'>?' {Char})]'?>'

'<?' ProcInstr ::= Name [S ({Char} - ({Char}'>?' {Char})]'?>'

see [15]

Inherited from
MarkupParserCommon
def xSpace(): Unit

scan [3] S ::= (#x20 | #x9 | #xD | #xA)+

scan [3] S ::= (#x20 | #x9 | #xD | #xA)+

Inherited from
MarkupParserCommon
def xSpaceOpt(): Unit

skip optional space S?

skip optional space S?

Inherited from
MarkupParserCommon
protected def xTag(pscope: NamespaceType): (String, AttributesType)

parse a start or empty tag. [40] STag ::= '<' Name { S Attribute } [S] [44] EmptyElemTag ::= '<' Name { S Attribute } [S]

parse a start or empty tag. [40] STag ::= '<' Name { S Attribute } [S] [44] EmptyElemTag ::= '<' Name { S Attribute } [S]

Inherited from
MarkupParserCommon
protected def xTakeUntil[T](handler: (PositionType, String) => T, positioner: () => PositionType, until: String): T

Take characters from input stream until given String "until" is seen. Once seen, the accumulated characters are passed along with the current Position to the supplied handler function.

Take characters from input stream until given String "until" is seen. Once seen, the accumulated characters are passed along with the current Position to the supplied handler function.

Inherited from
MarkupParserCommon
def xToken(that: Seq[Char]): Unit
Inherited from
MarkupParserCommon
def xToken(that: Char): Unit
Inherited from
MarkupParserCommon

Abstract fields

val input: Source
val preserveWS: Boolean

if true, does not remove surplus whitespace

if true, does not remove surplus whitespace

Concrete fields

protected val cbuf: StringBuilder

character buffer, for names

character buffer, for names

protected var curInput: Source
protected var doc: Document
var dtd: DTD
var extIndex: Int
var inpStack: List[Source]

stack of inputs

stack of inputs

var lastChRead: Char
var nextChNeeded: Boolean

holds the next character

holds the next character

var pos: Int

holds the position in the source file

holds the position in the source file

var reachedEof: Boolean
var tmppos: Int

holds temporary values of pos

holds temporary values of pos