org.jsoup.examples
Class HtmlToPlainText

java.lang.Object
  extended by org.jsoup.examples.HtmlToPlainText

public class HtmlToPlainText
extends Object

HTML to plain-text. This example program demonstrates the use of jsoup to convert HTML input to lightly-formatted plain-text. That is divergent from the general goal of jsoup's .text() methods, which is to get clean data from a scrape.

Note that this is a fairly simplistic formatter -- for real world use you'll want to embrace and extend.

To invoke from the command line, assuming you've downloaded the jsoup jar to your current directory:

java -cp jsoup.jar org.jsoup.examples.HtmlToPlainText url [selector]

where url is the URL to fetch, and selector is an optional CSS selector.

Author:
Jonathan Hedley, [email protected]

Constructor Summary
HtmlToPlainText()
           
 
Method Summary
 String getPlainText(Element element)
          Format an Element to plain-text
static void main(String... args)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlToPlainText

public HtmlToPlainText()
Method Detail

main

public static void main(String... args)
                 throws IOException
Throws:
IOException

getPlainText

public String getPlainText(Element element)
Format an Element to plain-text

Parameters:
element - the root element to format
Returns:
formatted text


Copyright © 2009-2015 Jonathan Hedley. All Rights Reserved.