org.jsoup.examples
Class HtmlToPlainText
java.lang.Object
org.jsoup.examples.HtmlToPlainText
public class HtmlToPlainText
- extends Object
HTML to plain-text. This example program demonstrates the use of jsoup to convert HTML input to lightly-formatted
plain-text. That is divergent from the general goal of jsoup's .text() methods, which is to get clean data from a
scrape.
Note that this is a fairly simplistic formatter -- for real world use you'll want to embrace and extend.
To invoke from the command line, assuming you've downloaded the jsoup jar to your current directory:
java -cp jsoup.jar org.jsoup.examples.HtmlToPlainText url [selector]
where url is the URL to fetch, and selector is an optional CSS selector.
- Author:
- Jonathan Hedley, [email protected]
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
HtmlToPlainText
public HtmlToPlainText()
main
public static void main(String... args)
throws IOException
- Throws:
IOException
getPlainText
public String getPlainText(Element element)
- Format an Element to plain-text
- Parameters:
element
- the root element to format
- Returns:
- formatted text
Copyright © 2009-2015 Jonathan Hedley. All Rights Reserved.