|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.jsoup.Jsoup
public class Jsoup
The core public access point to the jsoup functionality.
Method Summary | |
---|---|
static String |
clean(String bodyHtml,
String baseUri,
Whitelist whitelist)
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes. |
static String |
clean(String bodyHtml,
Whitelist whitelist)
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes. |
static Document |
parse(File in,
String charsetName)
Parse the contents of a file as HTML. |
static Document |
parse(File in,
String charsetName,
String baseUri)
Parse the contents of a file as HTML. |
static Document |
parse(String html)
Parse HTML into a Document. |
static Document |
parse(String html,
String baseUri)
Parse HTML into a Document. |
static Document |
parse(URL url,
int timeoutMillis)
Fetch a URL, and parse it as HTML. |
static Document |
parseBodyFragment(String bodyHtml)
Parse a fragment of HTML, with the assumption that it forms the body of the HTML. |
static Document |
parseBodyFragment(String bodyHtml,
String baseUri)
Parse a fragment of HTML, with the assumption that it forms the body of the HTML. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static Document parse(String html, String baseUri)
html
- HTML to parsebaseUri
- The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur
before the HTML declares a <base href>
tag.
public static Document parse(String html)
<base href>
tag.
html
- HTML to parse
parse(String, String)
public static Document parse(URL url, int timeoutMillis) throws IOException
url
- URL to fetch (with a GET). The protocol must be http
or https
.timeoutMillis
- Connection and read timeout, in milliseconds. If exceeded, IOException is thrown.
IOException
- If the final server response != 200 OK (redirects are followed), or if there's an error reading
the response stream.public static Document parse(File in, String charsetName, String baseUri) throws IOException
in
- file to load HTML fromcharsetName
- character set of file contents. If you don't know the charset, generally the best guess is UTF-8
.baseUri
- The URL where the HTML was retrieved from, to generate absolute URLs relative to.
IOException
- if the file could not be found, or read, or if the charsetName is invalid.public static Document parse(File in, String charsetName) throws IOException
in
- file to load HTML fromcharsetName
- character set of file contents. If you don't know the charset, generally the best guess is UTF-8
.
IOException
- if the file could not be found, or read, or if the charsetName is invalid.parse(File, String, String)
public static Document parseBodyFragment(String bodyHtml, String baseUri)
body
of the HTML.
bodyHtml
- body HTML fragmentbaseUri
- URL to resolve relative URLs against.
Document.body()
public static Document parseBodyFragment(String bodyHtml)
body
of the HTML.
bodyHtml
- body HTML fragment
Document.body()
public static String clean(String bodyHtml, String baseUri, Whitelist whitelist)
bodyHtml
- input untrusted HMTLbaseUri
- URL to resolve relative URLs againstwhitelist
- white-list of permitted HTML elements
Cleaner.clean(Document)
public static String clean(String bodyHtml, Whitelist whitelist)
bodyHtml
- input untrusted HTMLwhitelist
- white-list of permitted HTML elements
Cleaner.clean(Document)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |