public interface Encoder
Encoder
interface contains a number of methods for decoding input and encoding output
so that it will be safe for a variety of interpreters. Its primary use is to
provide output encoding to prevent XSS.
To prevent double-encoding, callers should make sure input does not already contain encoded characters
by calling one of the canonicalize()
methods. Validator implementations should call
canonicalize()
on user input before validating to prevent encoded attacks.
All of the methods must use an "allow list" or "positive" security model rather than a "deny list" or "negative" security model. For the encoding methods, this means that all characters should be encoded, except for a specific list of "immune" characters that are known to be safe.
The Encoder
performs two key functions, encoding (also referred to as "escaping" in this Javadoc)
and decoding. These functions rely on a set of codecs that can be found in the
org.owasp.esapi.codecs
package. These include:
The primary use of ESAPI Encoder
is to prevent XSS vulnerabilities by
providing output encoding using the various "encodeForXYZ()" methods,
where XYZ is one of CSS, HTML, HTMLAttribute, JavaScript, or URL. When
using the ESAPI output encoders, it is important that you use the one for the
appropriate context where the output will be rendered. For example, it
the output appears in an JavaScript context, you should use encodeForJavaScript
(note this includes all of the DOM JavaScript event handler attributes such as
'onfocus', 'onclick', 'onload', etc.). If the output would be rendered in an HTML
attribute context (with the exception of the aforementioned 'onevent' type event
handler attributes), you would use encodeForHTMLAttribute
. If you are
encoding anywhere a URL is expected (e.g., a 'href' attribute for for <a> or
a 'src' attribute on a <img> tag, etc.), then you should use use encodeForURL
.
If encoding CSS, then use encodeForCSS
. Etc. This is because there are
different escaping requirements for these different contexts. Developers who are
new to ESAPI or to defending against XSS vulnerabilities are highly encouraged to
first read the
OWASP Cross-Site Scripting Prevention Cheat Sheet.
Note that in addition to these encoder methods, ESAPI also provides a JSP Tag
Library (META-INF/esapi.tld
) in the ESAPI jar. This allows one to use
the more convenient JSP tags in JSPs. These JSP tags are simply wrappers for the
various these "encodeForXXYZ()" method docmented in this Encoder
interface.
Some important final words:
mailto:
URL?
Then instead of HTML encoding, it would need to have URL encoding. Similarly,
what if there is a later switch made to use AJAX and the untrusted email
address gets used in a JavaScript context? The complication is that even if
you know with certainty today all the ways that an untrusted data item is
used in your application, it is generally impossible to predict all the
contexts that it may be used in the future, not only in your application, but
in other applications that could access that data in the database.
</script>alert(1)</script>or similar simplistic XSS attack payloads and if that is properly encoded (or, you don't see an alert box popped in your browser), you consider it "problem fixed", and consider the unit testing sufficient. Unfortunately, that minimalist testing may not always detect places where you used the wrong output encoder. You need to do better. Fortunately, the aforementioned link, Automated Detecting and Repair of Cross-SiteScripting Vulnerabilities through Unit Testing provides some insight on this. You may also wish to look at the ESAPI Encoder JUnittest cases for ideas. If you are really ambitious, an excellent resource for XSS attack patterns is BeEF - The Browser Exploitation Framework Project.
Modifier and Type | Method and Description |
---|---|
String |
canonicalize(String input)
This method is equivalent to calling
Encoder.canonicalize(input, restrictMultiple, restrictMixed); . |
String |
canonicalize(String input,
boolean strict)
This method is the equivalent to calling
Encoder.canonicalize(input, strict, strict); . |
String |
canonicalize(String input,
boolean restrictMultiple,
boolean restrictMixed)
Canonicalization is simply the operation of reducing a possibly encoded
string down to its simplest form.
|
String |
decodeForHTML(String input)
Decodes HTML entities.
|
byte[] |
decodeFromBase64(String input)
Decode data encoded with BASE-64 encoding.
|
String |
decodeFromJSON(String input)
Decode data encoded for JSON strings.
|
String |
decodeFromURL(String input)
Decode from URL.
|
String |
encodeForBase64(byte[] input,
boolean wrap)
Encode for Base64.
|
String |
encodeForCSS(String untrustedData)
Encode data for use in Cascading Style Sheets (CSS) content.
|
String |
encodeForDN(String input)
Encode data for use in an LDAP distinguished name.
|
String |
encodeForHTML(String untrustedData)
Encode data for use in HTML using HTML entity encoding
|
String |
encodeForHTMLAttribute(String untrustedData)
Encode data for use in HTML attributes.
|
String |
encodeForJavaScript(String untrustedData)
Encode data for insertion inside a data value or function argument in JavaScript.
|
String |
encodeForJSON(String input)
Encode data for use in JSON strings.
|
String |
encodeForLDAP(String input)
Encode data for use in LDAP queries.
|
String |
encodeForLDAP(String input,
boolean encodeWildcards)
Encode data for use in LDAP queries.
|
String |
encodeForOS(Codec codec,
String input)
Encode for an operating system command shell according to the selected codec (appropriate codecs include the WindowsCodec and UnixCodec).
|
String |
encodeForSQL(Codec codec,
String input)
Encode input for use in a SQL query, according to the selected codec
(appropriate codecs include the MySQLCodec and OracleCodec).
|
String |
encodeForURL(String input)
Encode for use in a URL.
|
String |
encodeForVBScript(String untrustedData)
Encode data for insertion inside a data value in a Visual Basic script.
|
String |
encodeForXML(String input)
Encode data for use in an XML element.
|
String |
encodeForXMLAttribute(String input)
Encode data for use in an XML attribute.
|
String |
encodeForXPath(String input)
Encode data for use in an XPath query.
|
String |
getCanonicalizedURI(URI dirtyUri)
Get a version of the input URI that will be safe to run regex and other validations against.
|
String canonicalize(String input)
Encoder.canonicalize(input, restrictMultiple, restrictMixed);
.
The default values for restrictMultiple
and restrictMixed
come from ESAPI.properties
.
Encoder.AllowMultipleEncoding=false Encoder.AllowMixedEncoding=falseand the default codecs that are used for canonicalization are the list of codecs that comes from:
Encoder.DefaultCodecList=HTMLEntityCodec,PercentCodec,JavaScriptCodec(If the
Encoder.DefaultCodecList
property is null or not set,
these same codecs are listed in the same order. Note that you may supply
your own codec by using a fully cqualified class name of a class that
implements org.owasp.esapi.codecs.Codec<T>
.input
- the text to canonicalizecanonicalize(String, boolean, boolean)
,
W3C specificationsString canonicalize(String input, boolean strict)
Encoder.canonicalize(input, strict, strict);
.input
- the text to canonicalizestrict
- true if checking for multiple and mixed encoding is desired, false otherwisecanonicalize(String, boolean, boolean)
,
W3C specificationsString canonicalize(String input, boolean restrictMultiple, boolean restrictMixed)
Everyone says you shouldn't do validation without canonicalizing the data first. This is easier said than done. The canonicalize method can be used to simplify just about any input down to its most basic form. Note that canonicalize doesn't handle Unicode issues, it focuses on higher level encoding and escaping schemes. In addition to simple decoding, canonicalize also handles:
Using canonicalize is simple. The default is just...
String clean = ESAPI.encoder().canonicalize( request.getParameter("input"));You need to decode untrusted data so that it's safe for ANY downstream interpreter or decoder. For example, if your data goes into a Windows command shell, then into a database, and then to a browser, you're going to need to decode for all of those systems. You can build a custom encoder to canonicalize for your application like this...
ArrayList list = new ArrayList(); list.add( new WindowsCodec() ); list.add( new MySQLCodec() ); list.add( new PercentCodec() ); Encoder encoder = new DefaultEncoder( list ); String clean = encoder.canonicalize( request.getParameter( "input" ));or alternately, you can just customize
Encoder.DefaultCodecList
property
in the ESAPI.properties
file with your preferred codecs; for
example:
Encoder.DefaultCodecList=WindowsCodec,MySQLCodec,PercentCodecand then use:
Encoder encoder = ESAPI.encoder(); String clean = encoder.canonicalize( request.getParameter( "input" ));as you normally would. However, the downside to using the
ESAPI.properties
file approach does not allow you to vary your
list of codecs that are used each time. The downside to using the
DefaultEncoder
constructor is that your code is now timed to
specific reference implementations rather than just interfaces and those
reference implementations are what is most likely to change in ESAPI 3.x.
In ESAPI, the Validator
uses the canonicalize
method before it does validation. So all you need to
do is to validate as normal and you'll be protected against a host of encoded attacks.
String input = request.getParameter( "name" ); String name = ESAPI.validator().isValidInput( "test", input, "FirstName", 20, false);However, the default canonicalize() method only decodes HTMLEntity, percent (URL) encoding, and JavaScript encoding. If you'd like to use a custom canonicalizer with your validator, that's pretty easy too.
... setup custom encoder as above Validator validator = new DefaultValidator( encoder ); String input = request.getParameter( "name" ); String name = validator.isValidInput( "test", input, "name", 20, false);Although ESAPI is able to canonicalize multiple, mixed, or nested encoding, it's safer to not accept this stuff in the first place. In ESAPI, the default is "strict" mode that throws an IntrusionException if it receives anything not single-encoded with a single scheme. This is configurable in
ESAPI.properties
using the properties:
Encoder.AllowMultipleEncoding=false Encoder.AllowMixedEncoding=falseThis method allows you to override the default behavior by directly specifying whether to restrict multiple or mixed encoding. Even if you disable restrictions, you'll still get warning messages in the log about each multiple encoding and mixed encoding received.
// disabling strict mode to allow mixed encoding String url = ESAPI.encoder().canonicalize( request.getParameter("url"), false, false);WARNING #1!!! Please note that this method is incompatible with URLs and if there exist any HTML Entities that correspond with parameter values in a URL such as "¶" in a URL like "https://foo.com/?bar=foo¶meter=wrong" you will get a mixed encoding validation exception.
If you wish to canonicalize a URL/URI use the method Encoder.getCanonicalizedURI(URI dirtyUri);
WARNING #2!!! Even if you use WindowsCodec
or UnixCodec
as appropriate, file path names in the input
parameter will NOT
be canonicalized. It the failure of such file path name canonicalization
presents a potential security issue, consider using one of the
Validator.getValidDirectoryPath()
methods instead of or in addition to this method.
input
- the text to canonicalizerestrictMultiple
- true if checking for multiple encoding is desired, false otherwiserestrictMixed
- true if checking for mixed encoding is desired, false otherwisecanonicalize(String)
,
getCanonicalizedURI(URI dirtyUri)
,
Validator.getValidDirectoryPath(java.lang.String, java.lang.String, java.io.File, boolean)
String encodeForCSS(String untrustedData)
untrustedData
- the untrusted data to output encode for CSSString encodeForHTML(String untrustedData)
Note that the following characters: 00-08, 0B-0C, 0E-1F, and 7F-9F
cannot be used in HTML.
untrustedData
- the untrusted data to output encode for HTMLString decodeForHTML(String input)
input
- the String
to decodeString
String encodeForHTMLAttribute(String untrustedData)
untrustedData
- the untrusted data to output encode for an HTML attributeString encodeForJavaScript(String untrustedData)
<script> window.setInterval('<%= EVEN IF YOU ENCODE UNTRUSTED DATA YOU ARE XSSED HERE %>'); </script>
untrustedData
- the untrusted data to output encode for JavaScriptString encodeForVBScript(String untrustedData)
untrustedData
- the untrusted data to output encode for VBScriptString encodeForSQL(Codec codec, String input)
PreparedStatement
interface is the preferred approach. However, if for some reason
this is impossible, then this method is provided as a weaker
alternative.
The best approach is to make sure any single-quotes are double-quoted.
Another possible approach is to use the {escape} syntax described in the
JDBC specification in section 1.5.6.
However, this syntax does not work with all drivers, and requires
modification of all queries.codec
- a Codec that declares which database 'input' is being encoded for (ie. MySQL, Oracle, etc.)input
- the text to encode for SQLString encodeForOS(Codec codec, String input)
codec
- a Codec that declares which operating system 'input' is being encoded for (ie. Windows, Unix, etc.)input
- the text to encode for the command shellString encodeForLDAP(String input)
encodeForLDAP
began strict conformance with RFC 4515. Characters above 0x7F
are converted to UTF-8, and then the byte sequences are hex encoded according to the RFC.input
- the text to encode for LDAPString encodeForLDAP(String input, boolean encodeWildcards)
encodeForLDAP
began strict conformance with RFC 4515. Characters above 0x7F
are converted to UTF-8, and then the byte sequences are hex encoded according to the RFC.input
- the text to encode for LDAPencodeWildcards
- whether or not wildcard (*) characters will be encoded.String encodeForDN(String input)
encodeForDN
began strict conformance with RFC 4514. Characters above 0x7F
are converted to UTF-8, and then the byte sequences are hex encoded according to the RFC.input
- the text to encode for an LDAP distinguished nameString encodeForXPath(String input)
input
- the text to encode for XPathString encodeForXML(String input)
The use of a real XML parser is strongly encouraged. However, in the hopefully rare case that you need to make sure that data is safe for inclusion in an XML document and cannot use a parser, this method provides a safe mechanism to do so.
input
- the text to encode for XMLString encodeForXMLAttribute(String input)
The use of a real XML parser is highly encouraged. However, in the hopefully rare case that you need to make sure that data is safe for inclusion in an XML document and cannot use a parse, this method provides a safe mechanism to do so.
input
- the text to encode for use as an XML attributeString encodeForURL(String input) throws EncodingException
input
- the text to encode for use in a URLEncodingException
- if encoding failsString encodeForJSON(String input)
input
- the text to escape for JSON stringString decodeFromURL(String input) throws EncodingException
input
- the text to decode from an encoded URLEncodingException
- if decoding failsString encodeForBase64(byte[] input, boolean wrap)
input
- the text to encode for Base64wrap
- the encoder will wrap lines every 64 characters of outputbyte[] decodeFromBase64(String input) throws IOException
input
- the Base64 text to decodeIOException
String getCanonicalizedURI(URI dirtyUri)
dirtyUri
- the tainted URIString decodeFromJSON(String input)
input
- the JSON string to decodeCopyright © 2023 The Open Worldwide Application Security Project (OWASP). All rights reserved.