Jsoup (jsoup 0.2.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.jsoup
Class Jsoup

java.lang.Object
  org.jsoup.Jsoup

public class Jsoup
extends Object
extends Object

The core public access point to the jsoup functionality.

Author:: Jonathan Hedley

Method Summary
`static String`	`clean(String bodyHtml, String baseUri, Whitelist whitelist)` Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.
`static String`	`clean(String bodyHtml, Whitelist whitelist)` Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.
`static Document`	`parse(File in, String charsetName)` Parse the contents of a file as HTML.
`static Document`	`parse(File in, String charsetName, String baseUri)` Parse the contents of a file as HTML.
`static Document`	`parse(String html)` Parse HTML into a Document.
`static Document`	`parse(String html, String baseUri)` Parse HTML into a Document.
`static Document`	`parse(URL url, int timeoutMillis)` Fetch a URL, and parse it as HTML.
`static Document`	`parseBodyFragment(String bodyHtml)` Parse a fragment of HTML, with the assumption that it forms the `body` of the HTML.
`static Document`	`parseBodyFragment(String bodyHtml, String baseUri)` Parse a fragment of HTML, with the assumption that it forms the `body` of the HTML.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Method Detail

parse

public static Document parse(String html,
                             String baseUri)

Parse HTML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

Parameters:: html - HTML to parse; baseUri - The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag.
Returns:: sane HTML

parse

public static Document parse(String html)

Parse HTML into a Document. As no base URI is specified, absolute URL detection relies on the HTML including a <base href> tag.

Parameters:: html - HTML to parse
Returns:: sane HTML
See Also:: parse(String, String)

parse

public static Document parse(URL url,
                             int timeoutMillis)
                      throws IOException

Fetch a URL, and parse it as HTML.

Parameters:: url - URL to fetch (with a GET). The protocol must be http or https.; timeoutMillis - Connection and read timeout, in milliseconds. If exceeded, IOException is thrown.
Returns:: The parsed HTML.
Throws:: IOException - If the final server response != 200 OK (redirects are followed), or if there's an error reading the response stream.

parse

public static Document parse(File in,
                             String charsetName,
                             String baseUri)
                      throws IOException

Parse the contents of a file as HTML.

Parameters:: in - file to load HTML from; charsetName - character set of file contents. If you don't know the charset, generally the best guess is UTF-8.; baseUri - The URL where the HTML was retrieved from, to generate absolute URLs relative to.
Returns:: sane HTML
Throws:: IOException - if the file could not be found, or read, or if the charsetName is invalid.

parse

public static Document parse(File in,
                             String charsetName)
                      throws IOException

Parse the contents of a file as HTML. The location of the file is used as the base URI to qualify relative URLs.

Parameters:: in - file to load HTML from; charsetName - character set of file contents. If you don't know the charset, generally the best guess is UTF-8.
Returns:: sane HTML
Throws:: IOException - if the file could not be found, or read, or if the charsetName is invalid.
See Also:: parse(File, String, String)

parseBodyFragment

public static Document parseBodyFragment(String bodyHtml,
                                         String baseUri)

Parse a fragment of HTML, with the assumption that it forms the body of the HTML.

Parameters:: bodyHtml - body HTML fragment; baseUri - URL to resolve relative URLs against.
Returns:: sane HTML document
See Also:: Document.body()

parseBodyFragment

public static Document parseBodyFragment(String bodyHtml)

Parse a fragment of HTML, with the assumption that it forms the body of the HTML.

Parameters:: bodyHtml - body HTML fragment
Returns:: sane HTML document
See Also:: Document.body()

clean

public static String clean(String bodyHtml,
                           String baseUri,
                           Whitelist whitelist)

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

Parameters:: bodyHtml - input untrusted HMTL; baseUri - URL to resolve relative URLs against; whitelist - white-list of permitted HTML elements
Returns:: safe HTML
See Also:: Cleaner.clean(Document)

clean

public static String clean(String bodyHtml,
                           Whitelist whitelist)

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

Parameters:: bodyHtml - input untrusted HTML; whitelist - white-list of permitted HTML elements
Returns:: safe HTML
See Also:: Cleaner.clean(Document)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.jsoup Class Jsoup

parse

parse

parse

parse

parse

parseBodyFragment

parseBodyFragment

clean

clean

org.jsoup
Class Jsoup