scala-xml/scala.xml.parsing/MarkupParser

MarkupParser

trait MarkupParser extends MarkupParserCommon with TokenTests

An XML parser.

Parses XML 1.0, invokes callback methods of a MarkupHandler and returns whatever the markup handler returns. Use ConstructingParser if you just want to parse XML to construct instances of scala.xml.Node.

While XML elements are returned, DTD declarations - if handled - are collected using side-effects.

trait MarkupParserCommon

trait TokenTests

class Object

trait Matchable

class Any

class ConstructingParser

class XhtmlParser

Type members

Types

Value members

Abstract methods

Concrete methods

<! attlist := ATTLIST

content1 ::=  '<' content1 | '&' charref ...

'<' content1 ::=  ...

[22]     prolog      ::= XMLDecl? Misc* (doctypedecl Misc*)?
[23]     XMLDecl     ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[24]     VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
[25]     Eq          ::= S? '=' S?
[26]     VersionNum  ::= '1.0'
[27]     Misc        ::= Comment | PI | S

'<' element ::= xmlTag1 '>'  { xmlExpr | '{' simpleExpr '}' } ETag
             | xmlTag1 '/' '>'

<! element := ELEMENT

<! element := ELEMENT

externalID ::= SYSTEM S syslit
               PUBLIC S pubid S syslit

As the current code requires you to call nextch once manually after construction, this method formalizes that suboptimal reality.

"rec-xml/#ExtSubset" pe references may not occur within markup declarations

this method tells ch to get the next character when next called

'N' notationDecl ::= "OTATION"

parses document type declaration and assigns it to instance variable dtd.

<! parseDTD ::= DOCTYPE name ... >

<? prolog ::= xml S?
// this is a bit more lenient than necessary...

[12]       PubidLiteral ::=        '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"

append Unicode character to name buffer

attribute value, terminated by either ' or ". value may not contain <.

     AttValue     ::= `'` { _ } `'`
                    | `"` { _ } `"`

prolog, but without standalone

parse attribute and create namespace scope, metadata

[41] Attributes    ::= { S Name Eq AttValue }

'<! CharData ::= [CDATA[ ( {char} - {char}"]]>"{char} ) ']]>'

see [15]

Comment ::= ''

see [15]

entity value, terminated by either ' or ". value may not contain <.

     AttValue     ::= `'` { _  } `'`
                    | `"` { _ } `"`

<? prolog ::= xml S ... ?>

Inherited methods

Inherited from: TokenTests

Inherited from: TokenTests

Inherited from: MarkupParserCommon

These are 99% sure to be redundant but refactoring on the safe side.

Inherited from: TokenTests

Inherited from: TokenTests

See [5] of XML 1.0 specification.

Name ::= ( Letter | '_' ) (NameChar)*

See [5] of XML 1.0 specification.

Inherited from: TokenTests

See [4] and [4a] of Appendix B of XML 1.0 specification.

NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | #xB7
           | CombiningChar | Extender

See [4] and [4a] of Appendix B of XML 1.0 specification.

Inherited from: TokenTests

where Letter means in one of the Unicode general categories { Ll, Lu, Lo, Lt, Nl }.

NameStart ::= ( Letter | '_' | ':' )

where Letter means in one of the Unicode general categories { Ll, Lu, Lo, Lt, Nl }.

We do not allow a name to start with :. See [4] and Appendix B of XML 1.0 specification

Inherited from: TokenTests

Inherited from: TokenTests

(#x20 | #x9 | #xD | #xA)+

Inherited from: TokenTests

(#x20 | #x9 | #xD | #xA)

Inherited from: TokenTests

Returns true if the encoding name is a valid IANA encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an IANA encoding name.

Value Params

ianaEncoding: The IANA encoding name.

Inherited from

TokenTests

Apply a function and return the passed value

Inherited from: MarkupParserCommon

Execute body with a variable saved and restored after execution

Inherited from: MarkupParserCommon

Inherited from: MarkupParserCommon

Inherited from: MarkupParserCommon

attribute value, terminated by either ' or ". value may not contain <.

Value Params

endCh: either ' or "

Inherited from

MarkupParserCommon

Inherited from: MarkupParserCommon

Inherited from: MarkupParserCommon

CharRef ::= "&#" '0'..'9' {'0'..'9'} ";" | "&#x" '0'..'9'|'A'..'F'|'a'..'f' { hexdigit } ";"

see [66]

Inherited from: MarkupParserCommon

scan [S] '=' [S]

Inherited from: MarkupParserCommon

[42] '<' xmlEndTag ::= '<' '/' Name S? '>'

Inherited from: MarkupParserCommon

actually, Name ::= (Letter | '_' | ':') (NameChar)* but starting with ':' cannot happen Name ::= (Letter | '_') (NameChar)*

see [5] of XML 1.0 specification

pre-condition: ch != ':' // assured by definition of XMLSTART token post-condition: name does neither start, nor end in ':'

Inherited from: MarkupParserCommon

'<?' ProcInstr ::= Name [S ({Char} - ({Char}'>?' {Char})]'?>'

see [15]

Inherited from: MarkupParserCommon

scan [3] S ::= (#x20 | #x9 | #xD | #xA)+

Inherited from: MarkupParserCommon

skip optional space S?

Inherited from: MarkupParserCommon

parse a start or empty tag. [40] STag ::= '<' Name { S Attribute } [S] [44] EmptyElemTag ::= '<' Name { S Attribute } [S]

Inherited from: MarkupParserCommon

Take characters from input stream until given String "until" is seen. Once seen, the accumulated characters are passed along with the current Position to the supplied handler function.

Inherited from: MarkupParserCommon

Inherited from: MarkupParserCommon

Inherited from: MarkupParserCommon

Abstract fields

if true, does not remove surplus whitespace

Concrete fields

character buffer, for names

stack of inputs

holds the next character

holds the position in the source file

holds temporary values of pos