|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.htmlparser.visitors.NodeVisitor
org.opencms.util.StringBean
public class StringBean
Extracts the HTML page content.
Field Summary | |
---|---|
protected StringBuffer |
m_buffer
The buffer text is stored in while traversing the HTML. |
protected boolean |
m_collapse
If true sequences of whitespace characters are replaced
with a single space character. |
protected boolean |
m_isPre
Set true when traversing a PRE tag. |
protected boolean |
m_isScript
Set true when traversing a SCRIPT tag. |
protected boolean |
m_isStyle
Set true when traversing a STYLE tag. |
protected boolean |
m_links
If true the link URLs are embedded in the text output. |
protected String |
m_strings
The strings extracted from the URL. |
Constructor Summary | |
---|---|
StringBean()
Create a StringBean object. |
Method Summary | |
---|---|
protected void |
carriageReturn()
Appends a newline to the buffer if there isn't one there already. |
protected void |
carriageReturn(boolean check)
Appends a newline to the buffer if there isn't one there already. |
protected void |
collapse(StringBuffer buffer,
String string)
Add the given text collapsing whitespace. |
boolean |
getCollapse()
Get the current 'collapse whitespace' state. |
boolean |
getLinks()
Get the current 'include links' state. |
String |
getStrings()
Return the textual contents of the URL. |
void |
setCollapse(boolean collapse)
Set the current 'collapse whitespace' state. |
void |
setLinks(boolean links)
Set the 'include links' state. |
protected void |
setStrings()
Fetch the URL contents. |
protected void |
updateStrings(String strings)
Assign the Strings property, firing the property change. |
void |
visitEndTag(org.htmlparser.Tag tag)
Resets the state of the PRE and SCRIPT flags. |
void |
visitStringNode(org.htmlparser.Text string)
Appends the text to the output. |
void |
visitTag(org.htmlparser.Tag tag)
Appends a NEWLINE to the output if the tag breaks flow, and possibly sets the state of the PRE and SCRIPT flags. |
Methods inherited from class org.htmlparser.visitors.NodeVisitor |
---|
beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf, visitRemarkNode |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected StringBuffer m_buffer
protected boolean m_collapse
true
sequences of whitespace characters are replaced
with a single space character.
protected boolean m_isPre
true
when traversing a PRE tag.
protected boolean m_isScript
true
when traversing a SCRIPT tag.
protected boolean m_isStyle
true
when traversing a STYLE tag.
protected boolean m_links
true
the link URLs are embedded in the text output.
protected String m_strings
Constructor Detail |
---|
public StringBean()
Links
is set false
so text appears like a
browser would display it, albeit without the colour or underline clues
normally associated with a link.
ReplaceNonBreakingSpaces
is set true
, so
that printing the text works, but the extra information regarding these
formatting marks is available if you set it false.
Collapse
is set true
, so text appears
compact like a browser would display it.
Method Detail |
---|
public boolean getCollapse()
true
this emulates the operation of browsers
in interpretting text where user agents should collapse input white space sequences when producing output inter-word space. See HTML specification section 9.1 White space http://www.w3.org/TR/html4/struct/text.html#h-9.1.
true
if sequences of whitespace (space '\u0020',
tab '\u0009', form feed '\u000C', zero-width space '\u200B',
carriage-return '\r' and NEWLINE '\n') are to be replaced with a single
space.public boolean getLinks()
true
if link text is included in the text extracted
from the URL, false
otherwise.public String getStrings()
public void setCollapse(boolean collapse)
collapse
- If true
, sequences of whitespace
will be reduced to a single space.public void setLinks(boolean links)
links
- Use true
if link text is to be included in the
text extracted from the URL, false
otherwise.public void visitEndTag(org.htmlparser.Tag tag)
visitEndTag
in class org.htmlparser.visitors.NodeVisitor
tag
- The end tag to process.public void visitStringNode(org.htmlparser.Text string)
visitStringNode
in class org.htmlparser.visitors.NodeVisitor
string
- The text node.public void visitTag(org.htmlparser.Tag tag)
visitTag
in class org.htmlparser.visitors.NodeVisitor
tag
- The tag to examine.protected void carriageReturn()
protected void carriageReturn(boolean check)
check
- a parameter the developer forgot to commentprotected void collapse(StringBuffer buffer, String string)
state 0: whitepace was last emitted character state 1: in whitespace state 2: in word A whitespace character moves us to state 1 and any other character moves us to state 2, except that state 0 stays in state 0 until a non-whitespace and going from whitespace to word we emit a space before the character: input: whitespace other-character state\next 0 0 2 1 1 space then 2 2 1 2
buffer
- The buffer to append to.string
- The string to append.protected void setStrings()
protected void updateStrings(String strings)
Strings
property, firing the property change.
strings
- The new value of the Strings
property.
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |