AcceptDecideRule |
|
AddRedirectFromRootServerToScope |
|
ContentLengthDecideRule |
|
ContentTypeMatchesRegexDecideRule |
DecideRule whose decision is applied if the URI's content-type
is present and matches the supplied regular expression.
|
ContentTypeNotMatchesRegexDecideRule |
DecideRule whose decision is applied if the URI's content-type
is present and does not match the supplied regular expression.
|
DecideRule |
|
DecideRuleSequence |
|
ExternalGeoLocationDecideRule |
A rule that can be configured to take alternate implementations
of the ExternalGeoLocationInterface.
|
FetchStatusDecideRule |
Rule applies the configured decision for any URI which has a
fetch status equal to the 'target-status' setting.
|
FetchStatusMatchesRegexDecideRule |
|
FetchStatusNotMatchesRegexDecideRule |
|
HasViaDecideRule |
Rule applies the configured decision for any URI which has a 'via'
(essentially, any URI that was a seed or some kinds of mid-crawl adds).
|
HopCrossesAssignmentLevelDomainDecideRule |
Applies its decision if the current URI differs in that portion of
its hostname/domain that is assigned/sold by registrars, its
'assignment-level-domain' (ALD) (AKA 'public suffix' or in previous
Heritrix versions, 'topmost assigned SURT')
|
HopsPathMatchesRegexDecideRule |
Rule applies configured decision to any CrawlURIs whose 'hops-path'
(string like "LLXE" etc.) matches the supplied regex.
|
IpAddressSetDecideRule |
IpAddressSetDecideRule must be used with
org.archive.crawler.prefetch.Preselector#setRecheckScope(boolean) set
to true because it relies on Heritrix' dns lookup to establish the ip address
for a URI before it can run.
|
MatchesFilePatternDecideRule |
Compares suffix of a passed CrawlURI, UURI, or String against a regular
expression pattern, applying its configured decision to all matches.
|
MatchesListRegexDecideRule |
Rule applies configured decision to any CrawlURIs whose String URI
matches the supplied regexs.
|
MatchesRegexDecideRule |
Rule applies configured decision to any CrawlURIs whose String URI
matches the supplied regex.
|
MatchesStatusCodeDecideRule |
Provides a rule that returns "true" for any CrawlURIs which have a fetch
status code that falls within the provided inclusive range.
|
NotMatchesFilePatternDecideRule |
Rule applies configured decision to any URIs which do *not*
match the supplied (file-pattern) regex.
|
NotMatchesListRegexDecideRule |
Rule applies configured decision to any URIs which do *not*
match the supplied regex.
|
NotMatchesRegexDecideRule |
Rule applies configured decision to any URIs which do *not*
match the supplied regex.
|
NotMatchesStatusCodeDecideRule |
Provides a rule that returns "true" for any CrawlURIs which has a fetch
status code that does not fall within the provided inclusive range.
|
PathologicalPathDecideRule |
Rule REJECTs any URI which contains an excessive number of identical,
consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 '/a'
segments)
|
PredicatedDecideRule |
Rule which applies the configured decision only if a
test evaluates to true.
|
PrerequisiteAcceptDecideRule |
Rule which ACCEPTs all 'prerequisite' URIs (those with a 'P' in
the last hopsPath position).
|
RejectDecideRule |
|
ResourceLongerThanDecideRule |
Applies configured decision for URIs with content length greater than
a given threshold length value.
|
ResourceNoLongerThanDecideRule |
Applies configured decision for URIs with content length less than or equal
to a given threshold length value.
|
ResponseContentLengthDecideRule |
Decide rule that will ACCEPT or REJECT a uri, depending on the
"decision" property, after it's fetched, if the content body is within a
specified size range, specified in bytes.
|
SchemeNotInSetDecideRule |
Rule applies the configured decision (default REJECT) for any URI which
has a URI-scheme NOT contained in the configured Set.
|
ScriptedDecideRule |
Rule which runs a JSR-223 script to make its decision.
|
SeedAcceptDecideRule |
Rule which ACCEPTs all 'seed' URIs (those for which
isSeed is true).
|
SourceSeedDecideRule |
Rule applies the configured decision for any URI with discovered from one of
the seeds in sourceSeeds .
|
TooManyHopsDecideRule |
Rule REJECTs any CrawlURIs whose total number of hops (length of the
hopsPath string, traversed links of any type) is over a threshold.
|
TooManyPathSegmentsDecideRule |
Rule REJECTs any CrawlURIs whose total number of path-segments (as
indicated by the count of '/' characters not including the first '//')
is over a given threshold.
|
TransclusionDecideRule |
Rule ACCEPTs any CrawlURIs whose path-from-seed ('hopsPath' -- see
CrawlURI.getPathFromSeed() ends
with at least one, but not more than, the given number of
non-navlink ('L') hops.
|
ViaSurtPrefixedDecideRule |
Rule applies the configured decision for any URI which has a 'via' whose
surtform matches any surt specified in the surtPrefixes list
|