ConstructingParser

scala.xml.parsing.ConstructingParser
See theConstructingParser companion object
class ConstructingParser(val input: Source, val preserveWS: Boolean) extends ConstructingHandler, ExternalSources, MarkupParser

An xml parser. parses XML and invokes callback methods of a MarkupHandler. Don't forget to call next.ch on a freshly instantiated parser in order to initialize it. If you get the parser from the object method, initialization is already done for you.

object parseFromURL {
 def main(args: Array[String]) {
   val url = args(0)
   val src = scala.io.Source.fromURL(url)
   val cpa = scala.xml.parsing.ConstructingParser.fromSource(src, false) // fromSource initializes automatically
   val doc = cpa.document()

   // let's see what it is
   val ppr = new scala.xml.PrettyPrinter(80, 5)
   val ele = doc.docElem
   println("finished parsing")
   val out = ppr.format(ele)
   println(out)
 }
}

Attributes

Companion
object
Graph
Supertypes
trait MarkupParser
trait TokenTests
class Object
trait Matchable
class Any
Show all

Members list

Type members

Inherited types

Attributes

Inherited from:
MarkupParser
override type ElementType = NodeSeq

Attributes

Inherited from:
MarkupParser
override type InputType = Source

Attributes

Inherited from:
MarkupParser

Attributes

Inherited from:
MarkupParser
override type PositionType = Int

Attributes

Inherited from:
MarkupParser

Value members

Inherited methods

def appendText(pos: Int, ts: NodeBuffer, txt: String): Unit

Attributes

Inherited from:
MarkupParser
def attListDecl(name: String, attList: List[AttrDecl]): Unit

Attributes

Inherited from:
MarkupHandler
def attrDecl(): Unit
<! attlist := ATTLIST

Attributes

Inherited from:
MarkupParser
override def ch: Char

The library and compiler parsers had the interesting distinction of different behavior for nextch (a function for which there are a total of two plausible behaviors, so we know the design space was fully explored.) One of them returned the value of nextch before the increment and one of them the new value. So to unify code we have to at least temporarily abstract over the nextchs.

The library and compiler parsers had the interesting distinction of different behavior for nextch (a function for which there are a total of two plausible behaviors, so we know the design space was fully explored.) One of them returned the value of nextch before the increment and one of them the new value. So to unify code we have to at least temporarily abstract over the nextchs.

Attributes

Definition Classes
MarkupParser -> MarkupParserCommon
Inherited from:
MarkupParser
override protected def ch_returning_nextch: Char

Attributes

Definition Classes
MarkupParser -> MarkupParserCommon
Inherited from:
MarkupParser

Attributes

Inherited from:
TokenTests

Attributes

Inherited from:
TokenTests
override def comment(pos: Int, txt: String): Comment

callback method invoked by MarkupParser after parsing comment.

callback method invoked by MarkupParser after parsing comment.

Attributes

Definition Classes
Inherited from:
ConstructingHandler
content1 ::=  '<' content1 | '&' charref ...

Attributes

Inherited from:
MarkupParser
'<' content1 ::=  ...

Attributes

Inherited from:
MarkupParser
[22]     prolog      ::= XMLDecl? Misc* (doctypedecl Misc*)?
[23]     XMLDecl     ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[24]     VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
[25]     Eq          ::= S? '=' S?
[26]     VersionNum  ::= '1.0'
[27]     Misc        ::= Comment | PI | S

Attributes

Inherited from:
MarkupParser
override def elem(pos: Int, pre: String, label: String, attrs: MetaData, pscope: NamespaceBinding, empty: Boolean, nodes: NodeSeq): NodeSeq

callback method invoked by MarkupParser after parsing an element, between the elemStart and elemEnd callbacks

callback method invoked by MarkupParser after parsing an element, between the elemStart and elemEnd callbacks

Value parameters

args

the children of this element

attrs

the attributes (metadata)

empty

true if the element was previously empty; false otherwise.

label

the local name

pos

the position in the source file

pre

the prefix

Attributes

Definition Classes
Inherited from:
ConstructingHandler
def elemDecl(n: String, cmstr: String): Unit

Attributes

Inherited from:
MarkupHandler
def elemEnd(pos: Int, pre: String, label: String): Unit

callback method invoked by MarkupParser after end-tag of element.

callback method invoked by MarkupParser after end-tag of element.

Value parameters

label

the local name

pos

the position in the source file

pre

the prefix

Attributes

Inherited from:
MarkupHandler
def elemStart(pos: Int, pre: String, label: String, attrs: MetaData, scope: NamespaceBinding): Unit

callback method invoked by MarkupParser after start-tag of element.

callback method invoked by MarkupParser after start-tag of element.

Value parameters

attrs

the attributes (metadata)

label

the local name

pos

the position in the sourcefile

pre

the prefix

Attributes

Inherited from:
MarkupHandler

Attributes

Inherited from:
MarkupParser
'<' element ::= xmlTag1 '>'  { xmlExpr | '{' simpleExpr '}' } ETag
             | xmlTag1 '/' '>'

Attributes

Inherited from:
MarkupParser
def elementDecl(): Unit

<! element := ELEMENT

<! element := ELEMENT

Attributes

Inherited from:
MarkupParser
def endDTD(n: String): Unit

Attributes

Inherited from:
MarkupHandler
def entityDecl(): Unit
<! element := ELEMENT

Attributes

Inherited from:
MarkupParser
override def entityRef(pos: Int, n: String): EntityRef

callback method invoked by MarkupParser after parsing entity ref.

callback method invoked by MarkupParser after parsing entity ref.

Attributes

Todo

expanding entity references

Definition Classes
Inherited from:
ConstructingHandler
override def eof: Boolean

Attributes

Definition Classes
MarkupParser -> MarkupParserCommon
Inherited from:
MarkupParser
protected def errorAndResult[T](msg: String, x: T): T

Attributes

Inherited from:
MarkupParserCommon (hidden)
override def errorNoEnd(tag: String): Nothing

Attributes

Definition Classes
MarkupParser -> MarkupParserCommon
Inherited from:
MarkupParser
def extSubset(): Unit

Attributes

Inherited from:
MarkupParser
externalID ::= SYSTEM S syslit
               PUBLIC S pubid S syslit

Attributes

Inherited from:
MarkupParser
def externalSource(systemId: String): Source

Attributes

Inherited from:
ExternalSources
def initialize: MarkupParser.this.type

As the current code requires you to call nextch once manually after construction, this method formalizes that suboptimal reality.

As the current code requires you to call nextch once manually after construction, this method formalizes that suboptimal reality.

Attributes

Inherited from:
MarkupParser
def intSubset(): Unit

"rec-xml/#ExtSubset" pe references may not occur within markup declarations

"rec-xml/#ExtSubset" pe references may not occur within markup declarations

Attributes

Inherited from:
MarkupParser
def isAlpha(c: Char): Boolean

These are 99% sure to be redundant but refactoring on the safe side.

These are 99% sure to be redundant but refactoring on the safe side.

Attributes

Inherited from:
TokenTests

Attributes

Inherited from:
TokenTests
def isName(s: String): Boolean

See [5] of XML 1.0 specification.

Name ::= ( Letter | '_' ) (NameChar)*

See [5] of XML 1.0 specification.

Attributes

Inherited from:
TokenTests
def isNameChar(ch: Char): Boolean

See [4] and [4a] of Appendix B of XML 1.0 specification.

NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | #xB7
           | CombiningChar | Extender

See [4] and [4a] of Appendix B of XML 1.0 specification.

Attributes

Inherited from:
TokenTests

where Letter means in one of the Unicode general categories { Ll, Lu, Lo, Lt, Nl }.

NameStart ::= ( Letter | '_' | ':' )

where Letter means in one of the Unicode general categories { Ll, Lu, Lo, Lt, Nl }.

We do not allow a name to start with :. See [4] and Appendix B of XML 1.0 specification

Attributes

Inherited from:
TokenTests

Attributes

Inherited from:
TokenTests
final def isSpace(cs: Seq[Char]): Boolean
(#x20 | #x9 | #xD | #xA)+

Attributes

Inherited from:
TokenTests
final def isSpace(ch: Char): Boolean
(#x20 | #x9 | #xD | #xA)

Attributes

Inherited from:
TokenTests
def isValidIANAEncoding(ianaEncoding: Seq[Char]): Boolean

Returns true if the encoding name is a valid IANA encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an IANA encoding name.

Returns true if the encoding name is a valid IANA encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an IANA encoding name.

Value parameters

ianaEncoding

The IANA encoding name.

Attributes

Inherited from:
TokenTests
override def lookahead(): BufferedIterator[Char]

Create a lookahead reader which does not influence the input

Create a lookahead reader which does not influence the input

Attributes

Definition Classes
MarkupParser -> MarkupParserCommon
Inherited from:
MarkupParser

Attributes

Inherited from:
MarkupHandler
def markupDecl(): Unit

Attributes

Inherited from:
MarkupParser
def markupDecl1(): Any

Attributes

Inherited from:
MarkupParser
override def mkAttributes(name: String, pscope: NamespaceBinding): AttributesType

Attributes

Definition Classes
MarkupParser -> MarkupParserCommon
Inherited from:
MarkupParser
override def mkProcInstr(position: Int, name: String, text: String): ElementType

Attributes

Definition Classes
MarkupParser -> MarkupParserCommon
Inherited from:
MarkupParser
override def nextch(): Unit

this method tells ch to get the next character when next called

this method tells ch to get the next character when next called

Attributes

Definition Classes
MarkupParser -> MarkupParserCommon
Inherited from:
MarkupParser
'N' notationDecl ::= "OTATION"

Attributes

Inherited from:
MarkupParser
def notationDecl(notat: String, extID: ExternalID): Unit

Attributes

Inherited from:
MarkupHandler

Attributes

Inherited from:
MarkupHandler
def parseDTD(): Unit

parses document type declaration and assigns it to instance variable dtd.

parses document type declaration and assigns it to instance variable dtd.

<! parseDTD ::= DOCTYPE name ... >

Attributes

Inherited from:
MarkupParser
def parsedEntityDecl(name: String, edef: EntityDef): Unit

Attributes

Inherited from:
MarkupHandler
def peReference(name: String): Unit

Attributes

Inherited from:
MarkupHandler
def pop(): Unit

Attributes

Inherited from:
MarkupParser
override def procInstr(pos: Int, target: String, txt: String): ProcInstr

callback method invoked by MarkupParser after parsing PI.

callback method invoked by MarkupParser after parsing PI.

Attributes

Definition Classes
Inherited from:
ConstructingHandler
<? prolog ::= xml S?
// this is a bit more lenient than necessary...

Attributes

Inherited from:
MarkupParser
[12]       PubidLiteral ::=        '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"

Attributes

Inherited from:
MarkupParser
def push(entityName: String): Unit

Attributes

Inherited from:
MarkupParser
def pushExternal(systemId: String): Unit

Attributes

Inherited from:
MarkupParser
protected def putChar(c: Char): StringBuilder

append Unicode character to name buffer

append Unicode character to name buffer

Attributes

Inherited from:
MarkupParser
def replacementText(entityName: String): Source

Attributes

Inherited from:
MarkupHandler
override def reportSyntaxError(str: String): Unit

Attributes

Definition Classes
MarkupParser -> MarkupParserCommon
Inherited from:
MarkupParser
override def reportSyntaxError(pos: Int, str: String): Unit

Attributes

Definition Classes
MarkupParser -> MarkupParserCommon
Inherited from:
MarkupParser

Attributes

Inherited from:
MarkupParser
def returning[T](x: T)(f: T => Unit): T

Apply a function and return the passed value

Apply a function and return the passed value

Attributes

Inherited from:
MarkupParserCommon (hidden)
def saving[A, B](getter: A, setter: A => Unit)(body: => B): B

Execute body with a variable saved and restored after execution

Execute body with a variable saved and restored after execution

Attributes

Inherited from:
MarkupParserCommon (hidden)

attribute value, terminated by either ' or ". value may not contain <.

attribute value, terminated by either ' or ". value may not contain <.

     AttValue     ::= `'` { _ } `'`
                    | `"` { _ } `"`

Attributes

Inherited from:
MarkupParser
override def text(pos: Int, txt: String): Text

callback method invoked by MarkupParser after parsing text.

callback method invoked by MarkupParser after parsing text.

Attributes

Definition Classes
Inherited from:
ConstructingHandler

prolog, but without standalone

prolog, but without standalone

Attributes

Inherited from:
MarkupParser
override def truncatedError(msg: String): Nothing

Attributes

Definition Classes
MarkupParser -> MarkupParserCommon
Inherited from:
MarkupParser
def unparsedEntityDecl(name: String, extID: ExternalID, notat: String): Unit

Attributes

Inherited from:
MarkupHandler
protected def unreachable: Nothing

Attributes

Inherited from:
MarkupParserCommon (hidden)

Attributes

Inherited from:
MarkupParserCommon (hidden)
def xAttributeValue(endCh: Char): String

attribute value, terminated by either ' or ". value may not contain <.

attribute value, terminated by either ' or ". value may not contain <.

Value parameters

endCh

either ' or "

Attributes

Inherited from:
MarkupParserCommon (hidden)

parse attribute and create namespace scope, metadata

parse attribute and create namespace scope, metadata

[41] Attributes    ::= { S Name Eq AttValue }

Attributes

Inherited from:
MarkupParser
'<! CharData ::= [CDATA[ ( {char} - {char}"]]>"{char} ) ']]>'

see [15]

Attributes

Inherited from:
MarkupParser

Attributes

Inherited from:
MarkupParserCommon (hidden)

Attributes

Inherited from:
MarkupParserCommon (hidden)
def xCharRef(ch: () => Char, nextch: () => Unit): String

CharRef ::= "&#" '0'..'9' {'0'..'9'} ";" | "&#x" '0'..'9'|'A'..'F'|'a'..'f' { hexdigit } ";"

CharRef ::= "&#" '0'..'9' {'0'..'9'} ";" | "&#x" '0'..'9'|'A'..'F'|'a'..'f' { hexdigit } ";"

see [66]

Attributes

Inherited from:
MarkupParserCommon (hidden)
Comment ::= ''

see [15]

Attributes

Inherited from:
MarkupParser
def xEQ(): Unit

scan [S] '=' [S]

scan [S] '=' [S]

Attributes

Inherited from:
MarkupParserCommon (hidden)
def xEndTag(startName: String): Unit

[42] '<' xmlEndTag ::= '<' '/' Name S? '>'

[42] '<' xmlEndTag ::= '<' '/' Name S? '>'

Attributes

Inherited from:
MarkupParserCommon (hidden)

entity value, terminated by either ' or ". value may not contain <.

entity value, terminated by either ' or ". value may not contain <.

     AttValue     ::= `'` { _  } `'`
                    | `"` { _ } `"`

Attributes

Inherited from:
MarkupParser
override def xHandleError(that: Char, msg: String): Unit

Attributes

Definition Classes
MarkupParser -> MarkupParserCommon
Inherited from:
MarkupParser
def xName: String

actually, Name ::= (Letter | '_' | ':') (NameChar)* but starting with ':' cannot happen Name ::= (Letter | '_') (NameChar)*

actually, Name ::= (Letter | '_' | ':') (NameChar)* but starting with ':' cannot happen Name ::= (Letter | '_') (NameChar)*

see [5] of XML 1.0 specification

pre-condition: ch != ':' // assured by definition of XMLSTART token post-condition: name does neither start, nor end in ':'

Attributes

Inherited from:
MarkupParserCommon (hidden)
def xProcInstr: ElementType

'?' {Char})]'?>'

'?' {Char})]'?>'

see [15]

Attributes

Inherited from:
MarkupParserCommon (hidden)
def xSpace(): Unit

scan [3] S ::= (#x20 | #x9 | #xD | #xA)+

scan [3] S ::= (#x20 | #x9 | #xD | #xA)+

Attributes

Inherited from:
MarkupParserCommon (hidden)
def xSpaceOpt(): Unit

skip optional space S?

skip optional space S?

Attributes

Inherited from:
MarkupParserCommon (hidden)
protected def xTag(pscope: NamespaceType): (String, AttributesType)

parse a start or empty tag. [40] STag ::= '<' Name { S Attribute } [S] [44] EmptyElemTag ::= '<' Name { S Attribute } [S]

parse a start or empty tag. [40] STag ::= '<' Name { S Attribute } [S] [44] EmptyElemTag ::= '<' Name { S Attribute } [S]

Attributes

Inherited from:
MarkupParserCommon (hidden)
protected def xTakeUntil[T](handler: (PositionType, String) => T, positioner: () => PositionType, until: String): T

Take characters from input stream until given String "until" is seen. Once seen, the accumulated characters are passed along with the current Position to the supplied handler function.

Take characters from input stream until given String "until" is seen. Once seen, the accumulated characters are passed along with the current Position to the supplied handler function.

Attributes

Inherited from:
MarkupParserCommon (hidden)
def xToken(that: Seq[Char]): Unit

Attributes

Inherited from:
MarkupParserCommon (hidden)
def xToken(that: Char): Unit

Attributes

Inherited from:
MarkupParserCommon (hidden)
<? prolog ::= xml S ... ?>

Attributes

Inherited from:
MarkupParser

Concrete fields

override val input: Source
override val preserveWS: Boolean

if true, does not remove surplus whitespace

if true, does not remove surplus whitespace

Attributes

Inherited fields

protected val cbuf: StringBuilder

character buffer, for names

character buffer, for names

Attributes

Inherited from:
MarkupParser
protected var curInput: Source

Attributes

Inherited from:
MarkupParser
var decls: List[Decl]

Attributes

Inherited from:
MarkupHandler
protected var doc: Document

Attributes

Inherited from:
MarkupParser
var dtd: DTD

Attributes

Inherited from:
MarkupParser

Attributes

Inherited from:
MarkupHandler
var extIndex: Int

Attributes

Inherited from:
MarkupParser

stack of inputs

stack of inputs

Attributes

Inherited from:
MarkupParser

returns true is this markup handler is validating

returns true is this markup handler is validating

Attributes

Inherited from:
MarkupHandler

Attributes

Inherited from:
MarkupParser

holds the next character

holds the next character

Attributes

Inherited from:
MarkupParser
var pos: Int

holds the position in the source file

holds the position in the source file

Attributes

Inherited from:
MarkupParser

Attributes

Inherited from:
MarkupParser
var tmppos: Int

holds temporary values of pos

holds temporary values of pos

Attributes

Inherited from:
MarkupParser