Parser (jsoup 1.8.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.jsoup.parser
Class Parser

java.lang.Object
  org.jsoup.parser.Parser

public class Parser
extends Object
extends Object

Parses HTML into a Document. Generally best to use one of the more convenient parse methods in Jsoup.

Constructor Summary
`Parser(org.jsoup.parser.TreeBuilder treeBuilder)` Create a new Parser, using the specified TreeBuilder

Method Summary
`List<ParseError>`	`getErrors()` Retrieve the parse errors, if any, from the last parse.
`org.jsoup.parser.TreeBuilder`	`getTreeBuilder()` Get the TreeBuilder currently in use.
`static Parser`	`htmlParser()` Create a new HTML parser.
`boolean`	`isTrackErrors()` Check if parse error tracking is enabled.
`static Document`	`parse(String html, String baseUri)` Parse HTML into a Document.
`static Document`	`parseBodyFragment(String bodyHtml, String baseUri)` Parse a fragment of HTML into the `body` of a Document.
`static Document`	`parseBodyFragmentRelaxed(String bodyHtml, String baseUri)` Deprecated. Use `parseBodyFragment(java.lang.String, java.lang.String)` or `parseFragment(java.lang.String, org.jsoup.nodes.Element, java.lang.String)` instead.
`static List<Node>`	`parseFragment(String fragmentHtml, Element context, String baseUri)` Parse a fragment of HTML into a list of nodes.
`Document`	`parseInput(String html, String baseUri)`
`static List<Node>`	`parseXmlFragment(String fragmentXml, String baseUri)` Parse a fragment of XML into a list of nodes.
`Parser`	`setTrackErrors(int maxErrors)` Enable or disable parse error tracking for the next parse.
`Parser`	`setTreeBuilder(org.jsoup.parser.TreeBuilder treeBuilder)` Update the TreeBuilder used when parsing content.
`static String`	`unescapeEntities(String string, boolean inAttribute)` Utility method to unescape HTML entities from a string
`static Parser`	`xmlParser()` Create a new XML parser.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

Parser

public Parser(org.jsoup.parser.TreeBuilder treeBuilder)

Create a new Parser, using the specified TreeBuilder

Parameters:: treeBuilder - TreeBuilder to use to parse input into Documents.

Method Detail

parseInput

public Document parseInput(String html,
                           String baseUri)

getTreeBuilder

public org.jsoup.parser.TreeBuilder getTreeBuilder()

Get the TreeBuilder currently in use.

Returns:: current TreeBuilder.

setTreeBuilder

public Parser setTreeBuilder(org.jsoup.parser.TreeBuilder treeBuilder)

Update the TreeBuilder used when parsing content.

Parameters:: treeBuilder - current TreeBuilder
Returns:: this, for chaining

isTrackErrors

public boolean isTrackErrors()

Check if parse error tracking is enabled.

Returns:: current track error state.

setTrackErrors

public Parser setTrackErrors(int maxErrors)

Enable or disable parse error tracking for the next parse.

Parameters:: maxErrors - the maximum number of errors to track. Set to 0 to disable.
Returns:: this, for chaining

getErrors

public List<ParseError> getErrors()

Retrieve the parse errors, if any, from the last parse.

Returns:: list of parse errors, up to the size of the maximum errors tracked.

parse

public static Document parse(String html,
                             String baseUri)

Parse HTML into a Document.

Parameters:: html - HTML to parse; baseUri - base URI of document (i.e. original fetch location), for resolving relative URLs.
Returns:: parsed Document

parseFragment

public static List<Node> parseFragment(String fragmentHtml,
                                       Element context,
                                       String baseUri)

Parse a fragment of HTML into a list of nodes. The context element, if supplied, supplies parsing context.

Parameters:: fragmentHtml - the fragment of HTML to parse; context - (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation).; baseUri - base URI of document (i.e. original fetch location), for resolving relative URLs.
Returns:: list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified.

parseXmlFragment

public static List<Node> parseXmlFragment(String fragmentXml,
                                          String baseUri)

Parse a fragment of XML into a list of nodes.

Parameters:: fragmentXml - the fragment of XML to parse; baseUri - base URI of document (i.e. original fetch location), for resolving relative URLs.
Returns:: list of nodes parsed from the input XML.

parseBodyFragment

public static Document parseBodyFragment(String bodyHtml,
                                         String baseUri)

Parse a fragment of HTML into the body of a Document.

Parameters:: bodyHtml - fragment of HTML; baseUri - base URI of document (i.e. original fetch location), for resolving relative URLs.
Returns:: Document, with empty head, and HTML parsed into body

unescapeEntities

public static String unescapeEntities(String string,
                                      boolean inAttribute)

Utility method to unescape HTML entities from a string

Parameters:: string - HTML escaped string; inAttribute - if the string is to be escaped in strict mode (as attributes are)
Returns:: an unescaped string

parseBodyFragmentRelaxed

public static Document parseBodyFragmentRelaxed(String bodyHtml,
                                                String baseUri)

Deprecated. Use parseBodyFragment(java.lang.String, java.lang.String) or parseFragment(java.lang.String, org.jsoup.nodes.Element, java.lang.String) instead.

Parameters:: bodyHtml - HTML to parse; baseUri - baseUri base URI of document (i.e. original fetch location), for resolving relative URLs.
Returns:: parsed Document

htmlParser

public static Parser htmlParser()

Create a new HTML parser. This parser treats input as HTML5, and enforces the creation of a normalised document, based on a knowledge of the semantics of the incoming tags.

Returns:: a new HTML parser.

xmlParser

public static Parser xmlParser()

Create a new XML parser. This parser assumes no knowledge of the incoming tags and does not treat it as HTML, rather creates a simple tree directly from the input.

Returns:: a new simple XML parser.