org.jsoup.safety
Class Whitelist

java.lang.Object
  extended by org.jsoup.safety.Whitelist

public class Whitelist
extends Object

Whitelists define what HTML (elements and attributes) to allow through the cleaner. Everything else is removed.

Start with one of the defaults:

If you need to allow more through (please be careful!), tweak a base whitelist with:

The cleaner and these whitelists assume that you want to clean a body fragment of HTML (to add user supplied HTML into a templated page), and not to clean a full HTML document. If the latter is the case, either wrap the document HTML around the cleaned body HTML, or create a whitelist that allows html and head elements as appropriate.

If you are going to extend a whitelist, please be very careful. Make sure you understand what attributes may lead to XSS attack vectors. URL attributes are particularly vulnerable and require careful validation. See http://ha.ckers.org/xss.html for some XSS attack examples.

Author:
Jonathan Hedley

Constructor Summary
Whitelist()
          Create a new, empty whitelist.
 
Method Summary
 Whitelist addAttributes(String tag, String... keys)
          Add a list of allowed attributes to a tag.
 Whitelist addEnforcedAttribute(String tag, String key, String value)
          Add an enforced attribute to a tag.
 Whitelist addProtocols(String tag, String key, String... protocols)
          Add allowed URL protocols for an element's URL attribute.
 Whitelist addTags(String... tags)
          Add a list of allowed elements to a whitelist.
static Whitelist basic()
          This whitelist allows a fuller range of text nodes: a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li, ol, p, pre, q, small, strike, strong, sub, sup, u, ul, and appropriate attributes.
static Whitelist basicWithImages()
          This whitelist allows the same text tags as basic(), and also allows img tags, with appropriate attributes, with src pointing to http or https.
static Whitelist none()
          This whitelist allows only text nodes: all HTML will be stripped.
static Whitelist relaxed()
          This whitelist allows a full range of text and structural body HTML: a, b, blockquote, br, caption, cite, code, col, colgroup, dd, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul

Links do not have an enforced rel=nofollow attribute, but you can add that if desired.

static Whitelist simpleText()
          This whitelist allows only simple text formatting: b, em, i, strong, u.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Whitelist

public Whitelist()
Create a new, empty whitelist. Generally it will be better to start with a default prepared whitelist instead.

See Also:
basic(), basicWithImages(), simpleText(), relaxed()
Method Detail

none

public static Whitelist none()
This whitelist allows only text nodes: all HTML will be stripped.

Returns:
whitelist

simpleText

public static Whitelist simpleText()
This whitelist allows only simple text formatting: b, em, i, strong, u. All other HTML (tags and attributes) will be removed.

Returns:
whitelist

basic

public static Whitelist basic()
This whitelist allows a fuller range of text nodes: a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li, ol, p, pre, q, small, strike, strong, sub, sup, u, ul, and appropriate attributes.

Links (a elements) can point to http, https, ftp, mailto, and have an enforced rel=nofollow attribute.

Does not allow images.

Returns:
whitelist

basicWithImages

public static Whitelist basicWithImages()
This whitelist allows the same text tags as basic(), and also allows img tags, with appropriate attributes, with src pointing to http or https.

Returns:
whitelist

relaxed

public static Whitelist relaxed()
This whitelist allows a full range of text and structural body HTML: a, b, blockquote, br, caption, cite, code, col, colgroup, dd, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul

Links do not have an enforced rel=nofollow attribute, but you can add that if desired.

Returns:
whitelist

addTags

public Whitelist addTags(String... tags)
Add a list of allowed elements to a whitelist. (If a tag is not allowed, it will be removed from the HTML.)

Parameters:
tags - tag names to allow
Returns:
this (for chaining)

addAttributes

public Whitelist addAttributes(String tag,
                               String... keys)
Add a list of allowed attributes to a tag. (If an attribute is not allowed on an element, it will be removed.)

To make an attribute valid for all tags, use the pseudo tag :all, e.g. addAttributes(":all", "class").

Parameters:
tag - The tag the attributes are for
keys - List of valid attributes for the tag
Returns:
this (for chaining)

addEnforcedAttribute

public Whitelist addEnforcedAttribute(String tag,
                                      String key,
                                      String value)
Add an enforced attribute to a tag. An enforced attribute will always be added to the element. If the element already has the attribute set, it will be overridden.

E.g.: addEnforcedAttribute("a", "rel", "nofollow") will make all a tags output as <a href="..." rel="nofollow">

Parameters:
tag - The tag the enforced attribute is for
key - The attribute key
value - The enforced attribute value
Returns:
this (for chaining)

addProtocols

public Whitelist addProtocols(String tag,
                              String key,
                              String... protocols)
Add allowed URL protocols for an element's URL attribute. This restricts the possible values of the attribute to URLs with the defined protocol.

E.g.: addProtocols("a", "href", "ftp", "http", "https")

Parameters:
tag - Tag the URL protocol is for
key - Attribute key
protocols - List of valid protocols
Returns:
this, for chaining


Copyright © 2009-2010 Jonathan Hedley. All Rights Reserved.