org.opencms.util
Class StringBean

java.lang.Object
  extended by org.htmlparser.visitors.NodeVisitor
      extended by org.opencms.util.StringBean
All Implemented Interfaces:
Serializable

public class StringBean
extends org.htmlparser.visitors.NodeVisitor
implements Serializable

Extracts the HTML page content.

See Also:
Serialized Form

Field Summary
protected  StringBuffer m_buffer
          The buffer text is stored in while traversing the HTML.
protected  boolean m_collapse
          If true sequences of whitespace characters are replaced with a single space character.
protected  boolean m_isPre
          Set true when traversing a PRE tag.
protected  boolean m_isScript
          Set true when traversing a SCRIPT tag.
protected  boolean m_isStyle
          Set true when traversing a STYLE tag.
protected  boolean m_links
          If true the link URLs are embedded in the text output.
protected  String m_strings
          The strings extracted from the URL.
 
Constructor Summary
StringBean()
          Create a StringBean object.
 
Method Summary
protected  void carriageReturn()
          Appends a newline to the buffer if there isn't one there already.
protected  void carriageReturn(boolean check)
          Appends a newline to the buffer if there isn't one there already.
protected  void collapse(StringBuffer buffer, String string)
          Add the given text collapsing whitespace.
 boolean getCollapse()
          Get the current 'collapse whitespace' state.
 boolean getLinks()
          Get the current 'include links' state.
 String getStrings()
          Return the textual contents of the URL.
 void setCollapse(boolean collapse)
          Set the current 'collapse whitespace' state.
 void setLinks(boolean links)
          Set the 'include links' state.
protected  void setStrings()
          Fetch the URL contents.
protected  void updateStrings(String strings)
          Assign the Strings property, firing the property change.
 void visitEndTag(org.htmlparser.Tag tag)
          Resets the state of the PRE and SCRIPT flags.
 void visitStringNode(org.htmlparser.Text string)
          Appends the text to the output.
 void visitTag(org.htmlparser.Tag tag)
          Appends a NEWLINE to the output if the tag breaks flow, and possibly sets the state of the PRE and SCRIPT flags.
 
Methods inherited from class org.htmlparser.visitors.NodeVisitor
beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf, visitRemarkNode
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_buffer

protected StringBuffer m_buffer
The buffer text is stored in while traversing the HTML.


m_collapse

protected boolean m_collapse
If true sequences of whitespace characters are replaced with a single space character.


m_isPre

protected boolean m_isPre
Set true when traversing a PRE tag.


m_isScript

protected boolean m_isScript
Set true when traversing a SCRIPT tag.


m_isStyle

protected boolean m_isStyle
Set true when traversing a STYLE tag.


m_links

protected boolean m_links
If true the link URLs are embedded in the text output.


m_strings

protected String m_strings
The strings extracted from the URL.

Constructor Detail

StringBean

public StringBean()
Create a StringBean object. Default property values are set to 'do the right thing':

Links is set false so text appears like a browser would display it, albeit without the colour or underline clues normally associated with a link.

ReplaceNonBreakingSpaces is set true, so that printing the text works, but the extra information regarding these formatting marks is available if you set it false.

Collapse is set true, so text appears compact like a browser would display it.

Method Detail

getCollapse

public boolean getCollapse()
Get the current 'collapse whitespace' state. If set to true this emulates the operation of browsers in interpretting text where user agents should collapse input white space sequences when producing output inter-word space. See HTML specification section 9.1 White space http://www.w3.org/TR/html4/struct/text.html#h-9.1.

Returns:
true if sequences of whitespace (space '\u0020', tab '\u0009', form feed '\u000C', zero-width space '\u200B', carriage-return '\r' and NEWLINE '\n') are to be replaced with a single space.

getLinks

public boolean getLinks()
Get the current 'include links' state.

Returns:
true if link text is included in the text extracted from the URL, false otherwise.

getStrings

public String getStrings()
Return the textual contents of the URL. This is the primary output of the bean.

Returns:
The user visible (what would be seen in a browser) text.

setCollapse

public void setCollapse(boolean collapse)
Set the current 'collapse whitespace' state. If the setting is changed after the URL has been set, the text from the URL will be reacquired, which is possibly expensive.

Parameters:
collapse - If true, sequences of whitespace will be reduced to a single space.

setLinks

public void setLinks(boolean links)
Set the 'include links' state. If the setting is changed after the URL has been set, the text from the URL will be reacquired, which is possibly expensive.

Parameters:
links - Use true if link text is to be included in the text extracted from the URL, false otherwise.

visitEndTag

public void visitEndTag(org.htmlparser.Tag tag)
Resets the state of the PRE and SCRIPT flags.

Overrides:
visitEndTag in class org.htmlparser.visitors.NodeVisitor
Parameters:
tag - The end tag to process.

visitStringNode

public void visitStringNode(org.htmlparser.Text string)
Appends the text to the output.

Overrides:
visitStringNode in class org.htmlparser.visitors.NodeVisitor
Parameters:
string - The text node.

visitTag

public void visitTag(org.htmlparser.Tag tag)
Appends a NEWLINE to the output if the tag breaks flow, and possibly sets the state of the PRE and SCRIPT flags.

Overrides:
visitTag in class org.htmlparser.visitors.NodeVisitor
Parameters:
tag - The tag to examine.

carriageReturn

protected void carriageReturn()
Appends a newline to the buffer if there isn't one there already. Except if the buffer is empty.


carriageReturn

protected void carriageReturn(boolean check)
Appends a newline to the buffer if there isn't one there already. Except if the buffer is empty.

Parameters:
check - a parameter the developer forgot to comment

collapse

protected void collapse(StringBuffer buffer,
                        String string)
Add the given text collapsing whitespace. Use a little finite state machine:
 state 0: whitepace was last emitted character
 state 1: in whitespace
 state 2: in word
 A whitespace character moves us to state 1 and any other character
 moves us to state 2, except that state 0 stays in state 0 until
 a non-whitespace and going from whitespace to word we emit a space
 before the character:
    input:     whitespace   other-character
 state\next
    0               0             2
    1               1        space then 2
    2               1             2
 

Parameters:
buffer - The buffer to append to.
string - The string to append.

setStrings

protected void setStrings()
Fetch the URL contents. Only do work if there is a valid parser with it's URL set.


updateStrings

protected void updateStrings(String strings)
Assign the Strings property, firing the property change.

Parameters:
strings - The new value of the Strings property.