org.opencms.util
Class CmsHtmlExtractor

java.lang.Object
  extended by org.opencms.util.CmsHtmlExtractor

public final class CmsHtmlExtractor
extends Object

Extracts plain text from HTML.

Since:
6.0.0

Method Summary
static String extractText(InputStream in, String encoding)
          Extract the text from a HTML page.
static String extractText(String content, String encoding)
          Extract the text from a HTML page.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

extractText

public static String extractText(InputStream in,
                                 String encoding)
                          throws org.htmlparser.util.ParserException,
                                 UnsupportedEncodingException
Extract the text from a HTML page.

Parameters:
in - the html content input stream
encoding - the encoding of the content
Returns:
the extracted text from the page
Throws:
org.htmlparser.util.ParserException - if the parsing of the HTML failed
UnsupportedEncodingException - if the given encoding is not supported

extractText

public static String extractText(String content,
                                 String encoding)
                          throws org.htmlparser.util.ParserException,
                                 UnsupportedEncodingException
Extract the text from a HTML page.

Parameters:
content - the html content
encoding - the encoding of the content
Returns:
the extracted text from the page
Throws:
org.htmlparser.util.ParserException - if the parsing of the HTML failed
UnsupportedEncodingException - if the given encoding is not supported