Class TransclusionDecideRule

All Implemented Interfaces:
Serializable, org.archive.spring.HasKeyedProperties

public class TransclusionDecideRule
extends PredicatedDecideRule
Rule ACCEPTs any CrawlURIs whose path-from-seed ('hopsPath' -- see CrawlURI.getPathFromSeed() ends with at least one, but not more than, the given number of non-navlink ('L') hops. Otherwise, if the path-from-seed is empty or if a navlink ('L') occurs within max-trans-hops of the tail of the path-from-seed, this rule returns PASS.

Thus, it allows things like embedded resources (frames/images/media) and redirects to be transitively included ('transcluded') in a crawl, even if they otherwise would not, for some reasonable number of hops (usually 1-5).

Author:
gojomo
See Also:
Transclusion, Serialized Form
  • Constructor Details

    • TransclusionDecideRule

      public TransclusionDecideRule()
      Usual constructor.
  • Method Details

    • getMaxTransHops

      public int getMaxTransHops()
    • setMaxTransHops

      public void setMaxTransHops​(int maxTransHops)
      Maximum number of non-navlink (non-'L') hops to ACCEPT.
    • getMaxSpeculativeHops

      public int getMaxSpeculativeHops()
    • setMaxSpeculativeHops

      public void setMaxSpeculativeHops​(int maxSpeculativeHops)
      Maximum number of speculative ('X') hops to ACCEPT.
    • evaluate

      protected boolean evaluate​(CrawlURI curi)
      Evaluate whether given object is within the acceptable thresholds of transitive hops.
      Specified by:
      evaluate in class PredicatedDecideRule
      Parameters:
      curi - CrawlURI to make decision on.
      Returns:
      true if the transitive hops >0 and <= max