XhtmlParser

class XhtmlParser(val input: Source) extends ConstructingHandler with MarkupParser with ExternalSources

An XML Parser that preserves CDATA blocks and knows about scala.xml.parsing.XhtmlEntities.

An XML Parser that preserves CDATA blocks and knows about scala.xml.parsing.XhtmlEntities.

Authors

(c) David Pollak, 2007 WorldWide Conferencing, LLC.

Companion
object
trait MarkupParserCommon
class Object
trait Matchable
class Any

Type members

Inherited types

Inherited from
MarkupParser
type InputType = Source
Inherited from
MarkupParser
type PositionType = Int
Inherited from
MarkupParser

Value members

Inherited methods

def appendText(pos: Int, ts: NodeBuffer, txt: String): Unit
Inherited from
MarkupParser
def attListDecl(name: String, attList: List[AttrDecl]): Unit
Inherited from
MarkupHandler
def attrDecl(): Unit
<! attlist := ATTLIST
Inherited from
MarkupParser
def ch: Char
Inherited from
MarkupParser
protected def ch_returning_nextch: Char
Inherited from
MarkupParser
def checkPubID(s: String): Boolean
Inherited from
TokenTests
def checkSysID(s: String): Boolean
Inherited from
TokenTests
def comment(pos: Int, txt: String): NodeSeq
Inherited from
ConstructingHandler
content1 ::=  '<' content1 | '&' charref ...
Inherited from
MarkupParser
def content1(pscope: NamespaceBinding, ts: NodeBuffer): Unit
'<' content1 ::=  ...
Inherited from
MarkupParser
[22]     prolog      ::= XMLDecl? Misc* (doctypedecl Misc*)?
[23]     XMLDecl     ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[24]     VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
[25]     Eq          ::= S? '=' S?
[26]     VersionNum  ::= '1.0'
[27]     Misc        ::= Comment | PI | S
Inherited from
MarkupParser
def elem(pos: Int, pre: String, label: String, attrs: MetaData, pscope: NamespaceBinding, empty: Boolean, nodes: NodeSeq): NodeSeq
Inherited from
ConstructingHandler
def elemDecl(n: String, cmstr: String): Unit
Inherited from
MarkupHandler
def elemEnd(pos: Int, pre: String, label: String): Unit

callback method invoked by MarkupParser after end-tag of element.

callback method invoked by MarkupParser after end-tag of element.

Value Params
label

the local name

pos

the position in the source file

pre

the prefix

Inherited from
MarkupHandler
def elemStart(pos: Int, pre: String, label: String, attrs: MetaData, scope: NamespaceBinding): Unit

callback method invoked by MarkupParser after start-tag of element.

callback method invoked by MarkupParser after start-tag of element.

Value Params
attrs

the attributes (metadata)

label

the local name

pos

the position in the sourcefile

pre

the prefix

Inherited from
MarkupHandler
Inherited from
MarkupParser
'<' element ::= xmlTag1 '>'  { xmlExpr | '{' simpleExpr '}' } ETag
             | xmlTag1 '/' '>'
Inherited from
MarkupParser
def elementDecl(): Unit

<! element := ELEMENT

<! element := ELEMENT

Inherited from
MarkupParser
def endDTD(n: String): Unit
Inherited from
MarkupHandler
def entityDecl(): Unit
<! element := ELEMENT
Inherited from
MarkupParser
def entityRef(pos: Int, n: String): NodeSeq
Inherited from
ConstructingHandler
def eof: Boolean
Inherited from
MarkupParser
protected def errorAndResult[T](msg: String, x: T): T
Inherited from
MarkupParserCommon
def errorNoEnd(tag: String): Nothing
Inherited from
MarkupParser
def extSubset(): Unit
Inherited from
MarkupParser
externalID ::= SYSTEM S syslit
               PUBLIC S pubid S syslit
Inherited from
MarkupParser
def externalSource(systemId: String): Source
Inherited from
ExternalSources

As the current code requires you to call nextch once manually after construction, this method formalizes that suboptimal reality.

As the current code requires you to call nextch once manually after construction, this method formalizes that suboptimal reality.

Inherited from
MarkupParser
def intSubset(): Unit

"rec-xml/#ExtSubset" pe references may not occur within markup declarations

"rec-xml/#ExtSubset" pe references may not occur within markup declarations

Inherited from
MarkupParser
def isAlpha(c: Char): Boolean

These are 99% sure to be redundant but refactoring on the safe side.

These are 99% sure to be redundant but refactoring on the safe side.

Inherited from
TokenTests
def isAlphaDigit(c: Char): Boolean
Inherited from
TokenTests
def isName(s: String): Boolean

See [5] of XML 1.0 specification.

Name ::= ( Letter | '_' ) (NameChar)*

See [5] of XML 1.0 specification.

Inherited from
TokenTests
def isNameChar(ch: Char): Boolean

See [4] and [4a] of Appendix B of XML 1.0 specification.

NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | #xB7
           | CombiningChar | Extender

See [4] and [4a] of Appendix B of XML 1.0 specification.

Inherited from
TokenTests
def isNameStart(ch: Char): Boolean

where Letter means in one of the Unicode general categories { Ll, Lu, Lo, Lt, Nl }.

NameStart ::= ( Letter | '_' | ':' )

where Letter means in one of the Unicode general categories { Ll, Lu, Lo, Lt, Nl }.

We do not allow a name to start with :. See [4] and Appendix B of XML 1.0 specification

Inherited from
TokenTests
def isPubIDChar(ch: Char): Boolean
Inherited from
TokenTests
final def isSpace(cs: Seq[Char]): Boolean
(#x20 | #x9 | #xD | #xA)+
Inherited from
TokenTests
final def isSpace(ch: Char): Boolean
(#x20 | #x9 | #xD | #xA)
Inherited from
TokenTests
def isValidIANAEncoding(ianaEncoding: Seq[Char]): Boolean

Returns true if the encoding name is a valid IANA encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an IANA encoding name.

Returns true if the encoding name is a valid IANA encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an IANA encoding name.

Value Params
ianaEncoding

The IANA encoding name.

Inherited from
TokenTests
def lookahead(): BufferedIterator[Char]
Inherited from
MarkupParser
def lookupElemDecl(Label: String): ElemDecl
Inherited from
MarkupHandler
def markupDecl(): Unit
Inherited from
MarkupParser
def markupDecl1(): Matchable
Inherited from
MarkupParser
def mkAttributes(name: String, pscope: NamespaceBinding): AttributesType
Inherited from
MarkupParser
def mkProcInstr(position: Int, name: String, text: String): ElementType
Inherited from
MarkupParser
def nextch(): Unit

this method tells ch to get the next character when next called

this method tells ch to get the next character when next called

Inherited from
MarkupParser
def notationDecl(): Unit
'N' notationDecl ::= "OTATION"
Inherited from
MarkupParser
def notationDecl(notat: String, extID: ExternalID): Unit
Inherited from
MarkupHandler
def parameterEntityDecl(name: String, edef: EntityDef): Unit
Inherited from
MarkupHandler
def parseDTD(): Unit

parses document type declaration and assigns it to instance variable dtd.

parses document type declaration and assigns it to instance variable dtd.

<! parseDTD ::= DOCTYPE name ... >
Inherited from
MarkupParser
def parsedEntityDecl(name: String, edef: EntityDef): Unit
Inherited from
MarkupHandler
def peReference(name: String): Unit
Inherited from
MarkupHandler
def pop(): Unit
Inherited from
MarkupParser
def procInstr(pos: Int, target: String, txt: String): NodeSeq
Inherited from
ConstructingHandler
def prolog(): (Option[String], Option[String], Option[Boolean])
<? prolog ::= xml S?
// this is a bit more lenient than necessary...
Inherited from
MarkupParser
def pubidLiteral(): String
[12]       PubidLiteral ::=        '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"
Inherited from
MarkupParser
def push(entityName: String): Unit
Inherited from
MarkupParser
def pushExternal(systemId: String): Unit
Inherited from
MarkupParser
protected def putChar(c: Char): StringBuilder

append Unicode character to name buffer

append Unicode character to name buffer

Inherited from
MarkupParser
def replacementText(entityName: String): Source
Inherited from
MarkupHandler
def reportSyntaxError(str: String): Unit
Inherited from
MarkupParser
def reportSyntaxError(pos: Int, str: String): Unit
Inherited from
MarkupParser
def reportValidationError(pos: Int, str: String): Unit
Inherited from
MarkupParser
def returning[T](x: T)(f: T => Unit): T

Apply a function and return the passed value

Apply a function and return the passed value

Inherited from
MarkupParserCommon
def saving[A, B](getter: A, setter: A => Unit)(body: => B): B

Execute body with a variable saved and restored after execution

Execute body with a variable saved and restored after execution

Inherited from
MarkupParserCommon
def systemLiteral(): String

attribute value, terminated by either ' or ". value may not contain <.

attribute value, terminated by either ' or ". value may not contain <.

     AttValue     ::= `'` { _ } `'`
                    | `"` { _ } `"`
Inherited from
MarkupParser
def text(pos: Int, txt: String): NodeSeq
Inherited from
ConstructingHandler
def textDecl(): (Option[String], Option[String])

prolog, but without standalone

prolog, but without standalone

Inherited from
MarkupParser
def truncatedError(msg: String): Nothing
Inherited from
MarkupParser
def unparsedEntityDecl(name: String, extID: ExternalID, notat: String): Unit
Inherited from
MarkupHandler
protected def unreachable: Nothing
Inherited from
MarkupParserCommon
def xAttributeValue(): String
Inherited from
MarkupParserCommon
def xAttributeValue(endCh: Char): String

attribute value, terminated by either ' or ". value may not contain <.

attribute value, terminated by either ' or ". value may not contain <.

Value Params
endCh

either ' or "

Inherited from
MarkupParserCommon

parse attribute and create namespace scope, metadata

parse attribute and create namespace scope, metadata

[41] Attributes    ::= { S Name Eq AttValue }
Inherited from
MarkupParser
'<! CharData ::= [CDATA[ ( {char} - {char}"]]>"{char} ) ']]>'

see [15]
Inherited from
MarkupParser
def xCharRef: String
Inherited from
MarkupParserCommon
def xCharRef(it: Iterator[Char]): String
Inherited from
MarkupParserCommon
def xCharRef(ch: () => Char, nextch: () => Unit): String

CharRef ::= "&#" '0'..'9' {'0'..'9'} ";" | "&#x" '0'..'9'|'A'..'F'|'a'..'f' { hexdigit } ";"

CharRef ::= "&#" '0'..'9' {'0'..'9'} ";" | "&#x" '0'..'9'|'A'..'F'|'a'..'f' { hexdigit } ";"

see [66]

Inherited from
MarkupParserCommon
Comment ::= ''

see [15]
Inherited from
MarkupParser
def xEQ(): Unit

scan [S] '=' [S]

scan [S] '=' [S]

Inherited from
MarkupParserCommon
def xEndTag(startName: String): Unit

[42] '<' xmlEndTag ::= '<' '/' Name S? '>'

[42] '<' xmlEndTag ::= '<' '/' Name S? '>'

Inherited from
MarkupParserCommon
def xEntityValue(): String

entity value, terminated by either ' or ". value may not contain <.

entity value, terminated by either ' or ". value may not contain <.

     AttValue     ::= `'` { _  } `'`
                    | `"` { _ } `"`
Inherited from
MarkupParser
def xHandleError(that: Char, msg: String): Unit
Inherited from
MarkupParser
def xName: String

actually, Name ::= (Letter | '_' | ':') (NameChar)* but starting with ':' cannot happen Name ::= (Letter | '_') (NameChar)*

actually, Name ::= (Letter | '_' | ':') (NameChar)* but starting with ':' cannot happen Name ::= (Letter | '_') (NameChar)*

see [5] of XML 1.0 specification

pre-condition: ch != ':' // assured by definition of XMLSTART token post-condition: name does neither start, nor end in ':'

Inherited from
MarkupParserCommon

'?' {Char})]'?>'

'?' {Char})]'?>'

see [15]

Inherited from
MarkupParserCommon
def xSpace(): Unit

scan [3] S ::= (#x20 | #x9 | #xD | #xA)+

scan [3] S ::= (#x20 | #x9 | #xD | #xA)+

Inherited from
MarkupParserCommon
def xSpaceOpt(): Unit

skip optional space S?

skip optional space S?

Inherited from
MarkupParserCommon
protected def xTag(pscope: NamespaceType): (String, AttributesType)

parse a start or empty tag. [40] STag ::= '<' Name { S Attribute } [S] [44] EmptyElemTag ::= '<' Name { S Attribute } [S]

parse a start or empty tag. [40] STag ::= '<' Name { S Attribute } [S] [44] EmptyElemTag ::= '<' Name { S Attribute } [S]

Inherited from
MarkupParserCommon
protected def xTakeUntil[T](handler: (PositionType, String) => T, positioner: () => PositionType, until: String): T

Take characters from input stream until given String "until" is seen. Once seen, the accumulated characters are passed along with the current Position to the supplied handler function.

Take characters from input stream until given String "until" is seen. Once seen, the accumulated characters are passed along with the current Position to the supplied handler function.

Inherited from
MarkupParserCommon
def xToken(that: Seq[Char]): Unit
Inherited from
MarkupParserCommon
def xToken(that: Char): Unit
Inherited from
MarkupParserCommon
<? prolog ::= xml S ... ?>
Inherited from
MarkupParser

Concrete fields

val input: Source
val preserveWS: Boolean

Inherited fields

protected val cbuf: StringBuilder

character buffer, for names

character buffer, for names

Inherited from
MarkupParser
protected var curInput: Source
Inherited from
MarkupParser
var decls: List[Decl]
Inherited from
MarkupHandler
protected var doc: Document
Inherited from
MarkupParser
var dtd: DTD
Inherited from
MarkupParser
var ent: Map[String, EntityDecl]
Inherited from
MarkupHandler
var extIndex: Int
Inherited from
MarkupParser
var inpStack: List[Source]

stack of inputs

stack of inputs

Inherited from
MarkupParser
val isValidating: Boolean

returns true is this markup handler is validating

returns true is this markup handler is validating

Inherited from
MarkupHandler
var lastChRead: Char
Inherited from
MarkupParser
var nextChNeeded: Boolean

holds the next character

holds the next character

Inherited from
MarkupParser
var pos: Int

holds the position in the source file

holds the position in the source file

Inherited from
MarkupParser
var reachedEof: Boolean
Inherited from
MarkupParser
var tmppos: Int

holds temporary values of pos

holds temporary values of pos

Inherited from
MarkupParser