Class PathologicalPathDecideRule

java.lang.Object
org.archive.modules.deciderules.DecideRule
org.archive.modules.deciderules.PathologicalPathDecideRule
All Implemented Interfaces:
Serializable, HasKeyedProperties

public class PathologicalPathDecideRule extends DecideRule
Rule REJECTs any URI which contains an excessive number of identical, consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 '/a' segments)
Author:
gojomo
See Also:
  • Constructor Details

    • PathologicalPathDecideRule

      public PathologicalPathDecideRule()
      Constructs a new PathologicalPathFilter.
  • Method Details

    • getMaxRepetitions

      public int getMaxRepetitions()
    • setMaxRepetitions

      public void setMaxRepetitions(int maxRepetitions)
      Number of times the pattern should be allowed to occur. This rule returns its decision (usually REJECT) if a path-segment is repeated more than number of times.
    • innerDecide

      protected DecideResult innerDecide(CrawlURI uri)
      Specified by:
      innerDecide in class DecideRule
    • constructRegex

      protected String constructRegex(int rep)