public class SpiderParam extends AbstractParam
Modifier and Type | Class and Description |
---|---|
static class |
SpiderParam.HandleParametersOption
This option is used to define how the parameters are used when checking if an URI was already
visited.
|
Modifier and Type | Field and Description |
---|---|
static int |
UNLIMITED_DEPTH
The value that indicates that the crawl depth is unlimited.
|
Constructor and Description |
---|
SpiderParam()
Instantiates a new spider param.
|
Modifier and Type | Method and Description |
---|---|
List<DomainAlwaysInScopeMatcher> |
getDomainsAlwaysInScope()
Returns the domains that will be always in scope.
|
List<DomainAlwaysInScopeMatcher> |
getDomainsAlwaysInScopeEnabled()
Returns the, enabled, domains that will be always in scope.
|
SpiderParam.HandleParametersOption |
getHandleParameters()
Gets how the spider handles parameters when checking URIs visited.
|
int |
getMaxChildren()
Gets the maximum number of child nodes (per node) that can be crawled, 0 means no limit.
|
int |
getMaxDepth()
Gets the maximum depth the spider can crawl.
|
int |
getMaxDuration()
Returns the maximum duration in minutes that the spider should run for.
|
int |
getMaxParseSizeBytes()
Gets the maximum size, in bytes, that a response might have to be parsed.
|
int |
getMaxScansInUI() |
int |
getRequestWaitTime()
Gets the time between the requests sent to a server.
|
String |
getScope()
Deprecated.
(2.3.0) Replaced by
getDomainsAlwaysInScope() and getDomainsAlwaysInScopeEnabled() . |
String |
getScopeText()
Deprecated.
(2.3.0) Replaced by
getDomainsAlwaysInScope() and getDomainsAlwaysInScopeEnabled() . Note: Newer regular expression
excluded domains will not be returned by this method. |
String |
getSkipURLString()
Gets the skip url string.
|
int |
getThreadCount()
Gets the thread count.
|
String |
getUserAgent()
Gets the user agent.
|
boolean |
isAcceptCookies()
Tells whether or not a spider process should accept cookies while spidering.
|
boolean |
isConfirmRemoveDomainAlwaysInScope()
Tells whether or not the remotion of a "domain always in scope" needs confirmation.
|
boolean |
isHandleODataParametersVisited()
Check if the spider should take into account OData-specific parameters (i.e : resource
identifiers) in order to identify already visited URL
|
boolean |
isParseComments()
Checks if the spider should parse the comments.
|
boolean |
isParseGit()
Checks if the spider should parse the Git files for URIs.
|
boolean |
isParseRobotsTxt()
Checks if the spider should parse the robots.txt for uris (not related to following the
directions).
|
boolean |
isParseSitemapXml()
Checks if the spider should parse the sitemap.xml for URIs.
|
boolean |
isParseSVNEntries()
Checks if the spider should parse the SVN entries files for URIs (not related to following
the directions).
|
boolean |
isPostForm()
Checks if is the forms should be submitted with the HTTP POST method.
|
boolean |
isProcessForm()
Checks if the forms should be processed.
|
boolean |
isSendRefererHeader()
Tells whether or not the "Referer" header should be sent in spider requests.
|
boolean |
isShowAdvancedDialog() |
boolean |
isSkipURL(org.apache.commons.httpclient.URI uri)
Checks if is this url should be skipped.
|
protected void |
parse()
Parses the configurations.
|
void |
setAcceptCookies(boolean acceptCookies)
Sets whether or not a spider process should accept cookies while spidering.
|
void |
setConfirmRemoveDomainAlwaysInScope(boolean confirmRemove)
Sets whether or not the remotion of a "domain always in scope" needs confirmation.
|
void |
setDomainsAlwaysInScope(List<DomainAlwaysInScopeMatcher> domainsAlwaysInScope)
Sets the domains that will be always in scope.
|
void |
setHandleODataParametersVisited(boolean handleODataParametersVisited)
Defines if the spider should handle OData specific parameters (i.e : resource identifiers) To
identify already visited URL
|
void |
setHandleParameters(SpiderParam.HandleParametersOption handleParametersVisited)
Sets the how the spider handles parameters when checking URIs visited.
|
void |
setHandleParameters(String handleParametersVisited)
Sets the how the spider handles parameters when checking URIs visited.
|
void |
setMaxChildren(int maxChildren)
Sets the maximum number of child nodes (per node) that can be crawled, 0 means no limit.
|
void |
setMaxDepth(int maxDepth)
Sets the maximum depth the spider can crawl.
|
void |
setMaxDuration(int maxDuration)
Sets the maximum duration in minutes that the spider should run for.
|
void |
setMaxParseSizeBytes(int maxParseSizeBytes)
Sets the maximum size, in bytes, that a response might have to be parsed.
|
void |
setMaxScansInUI(int maxScansInUI) |
void |
setParseComments(boolean parseComments)
Sets the whether the spider parses the comments.
|
void |
setParseGit(boolean parseGit)
Sets the whether the spider parses Git files for URIs
|
void |
setParseRobotsTxt(boolean parseRobotsTxt)
Sets the whether the spider parses the robots.txt for uris (not related to following the
directions).
|
void |
setParseSitemapXml(boolean parseSitemapXml)
Sets the whether the spider parses the sitemap.xml for URIs.
|
void |
setParseSVNEntries(boolean parseSVNentries)
Sets the whether the spider parses the SVN entries file for URIs (not related to following
the directions).
|
void |
setPostForm(boolean postForm)
Sets if the forms should be submitted with the HTTP POST method.
|
void |
setProcessForm(boolean processForm)
Sets if the forms should be processed.
|
void |
setRequestWaitTime(int requestWait)
Sets the time between the requests sent to a server.
|
void |
setScopeString(String scope)
Deprecated.
(2.3.0) Replaced by
setDomainsAlwaysInScope(List) |
void |
setSendRefererHeader(boolean send)
Sets whether or not the "Referer" header should be sent in spider requests.
|
void |
setShowAdvancedDialog(boolean showAdvancedDialog) |
void |
setSkipURLString(String skipURL)
Sets the skip url string.
|
void |
setThreadCount(int thread)
Sets the thread count.
|
void |
setUserAgent(String userAgent)
Sets the user agent, if different from the default one.
|
clone, getBoolean, getConfig, getInt, getInteger, getString, load, load, load, logConversionException, reset
public static final int UNLIMITED_DEPTH
setMaxDepth(int)
,
Constant Field Valuesprotected void parse()
AbstractParam
Called each time the configurations are loaded.
parse
in class AbstractParam
AbstractParam.getConfig()
public int getMaxDepth()
setMaxDepth(int)
public void setMaxDepth(int maxDepth)
Value 0 for unlimited depth.
maxDepth
- the new maximum depth.getMaxDepth()
@Deprecated public String getScopeText()
getDomainsAlwaysInScope()
and getDomainsAlwaysInScopeEnabled()
. Note: Newer regular expression
excluded domains will not be returned by this method.@Deprecated public String getScope()
getDomainsAlwaysInScope()
and getDomainsAlwaysInScopeEnabled()
.@Deprecated public void setScopeString(String scope)
setDomainsAlwaysInScope(List)
scope
- The scope string to set.public int getThreadCount()
public void setThreadCount(int thread)
thread
- The thread count to set.public boolean isPostForm()
public void setPostForm(boolean postForm)
postForm
- the new post form statuspublic boolean isProcessForm()
public void setProcessForm(boolean processForm)
processForm
- the new process form statuspublic void setSkipURLString(String skipURL)
skipURL
- the new skip url stringpublic String getSkipURLString()
public boolean isSkipURL(org.apache.commons.httpclient.URI uri)
uri
- the uripublic int getRequestWaitTime()
public void setRequestWaitTime(int requestWait)
requestWait
- the new request wait timepublic String getUserAgent()
public void setUserAgent(String userAgent)
userAgent
- the new user agentpublic boolean isParseComments()
public void setParseComments(boolean parseComments)
parseComments
- the new parses the comments valuepublic boolean isParseRobotsTxt()
public boolean isParseSitemapXml()
public boolean isParseSVNEntries()
public boolean isParseGit()
public void setParseRobotsTxt(boolean parseRobotsTxt)
parseRobotsTxt
- the new value for parseRobotsTxtpublic void setParseSitemapXml(boolean parseSitemapXml)
parseSitemapXml
- the new value for parseSitemapXmlpublic void setParseSVNEntries(boolean parseSVNentries)
parseSVNentries
- the new value for parseSVNentriespublic void setParseGit(boolean parseGit)
parseGit
- the new value for parseGitpublic SpiderParam.HandleParametersOption getHandleParameters()
public void setHandleParameters(SpiderParam.HandleParametersOption handleParametersVisited)
handleParametersVisited
- the new handle parameters visited valuepublic void setHandleParameters(String handleParametersVisited)
The provided parameter is, in this case, a String which is cast to the proper value.
Possible values are: "USE_ALL"
, "IGNORE_VALUE"
, "IGNORE_COMPLETELY"
.
handleParametersVisited
- the new handle parameters visited valueIllegalArgumentException
- if the given parameter is not a value of HandleParametersOption
.public boolean isHandleODataParametersVisited()
public void setHandleODataParametersVisited(boolean handleODataParametersVisited)
handleODataParametersVisited
- the new value for handleODataParametersVisitedpublic List<DomainAlwaysInScopeMatcher> getDomainsAlwaysInScope()
getDomainsAlwaysInScopeEnabled()
,
setDomainsAlwaysInScope(List)
public List<DomainAlwaysInScopeMatcher> getDomainsAlwaysInScopeEnabled()
getDomainsAlwaysInScope()
,
setDomainsAlwaysInScope(List)
public void setDomainsAlwaysInScope(List<DomainAlwaysInScopeMatcher> domainsAlwaysInScope)
domainsAlwaysInScope
- the domains that will be excluded.public boolean isConfirmRemoveDomainAlwaysInScope()
true
if the remotion needs confirmation, false
otherwise.public void setConfirmRemoveDomainAlwaysInScope(boolean confirmRemove)
confirmRemove
- true
if the remotion needs confirmation, false
otherwise.public int getMaxScansInUI()
public void setMaxScansInUI(int maxScansInUI)
public boolean isShowAdvancedDialog()
public void setShowAdvancedDialog(boolean showAdvancedDialog)
public boolean isSendRefererHeader()
true
if the "Referer" header should be sent in spider requests, false
otherwisepublic void setSendRefererHeader(boolean send)
send
- true
if the "Referer" header should be sent in spider requests, false
otherwisepublic int getMaxDuration()
public void setMaxDuration(int maxDuration)
maxDuration
- the maximum time, in minutes, that the spider should runpublic int getMaxChildren()
public void setMaxChildren(int maxChildren)
maxChildren
- the maximum number of child nodes that can be crawled.public void setAcceptCookies(boolean acceptCookies)
For example, this might control whether or not the Spider uses the same session throughout a spidering process.
Notes:
User
was set or the option Session Tracking
(Cookie)
is enabled.
acceptCookies
- true
if the spider should accept cookies, false
otherwise.isAcceptCookies()
public boolean isAcceptCookies()
For example, this might control whether or not the Spider uses the same session throughout a spidering process.
true
if the spider should accept cookies, false
otherwise.setAcceptCookies(boolean)
public void setMaxParseSizeBytes(int maxParseSizeBytes)
This allows the spider to skip big responses/files.
maxParseSizeBytes
- the maximum size, in bytes, that a response might have to be parsed.getMaxParseSizeBytes()
public int getMaxParseSizeBytes()
setMaxParseSizeBytes(int)