Class ResourceNoLongerThanDecideRule

All Implemented Interfaces:
Serializable, org.archive.spring.HasKeyedProperties
Direct Known Subclasses:
ResourceLongerThanDecideRule

public class ResourceNoLongerThanDecideRule
extends PredicatedDecideRule
Applies configured decision for URIs with content length less than or equal to a given threshold length value. Examines either HTTP header Content-Length or actual downloaded content length (based on the useHeaderLength property), and has no effect on resources longer than the given threshold value. Note that because neither the Content-Length header nor the actual size are available at URI-scoping time, this rule is unusable in crawl scopes. Instead, the earliest it can be used is as a mid-fetch rule (in FetchHTTP), when the headers are available but not yet the body. It can also be used to affect processing after the URI is fully fetched.
See Also:
Serialized Form
  • Field Details

    • HEADER_PREDICTS_MISSING

      public static final int HEADER_PREDICTS_MISSING
      See Also:
      Constant Field Values
  • Constructor Details

    • ResourceNoLongerThanDecideRule

      public ResourceNoLongerThanDecideRule()
  • Method Details

    • getUseHeaderLength

      public boolean getUseHeaderLength()
    • setUseHeaderLength

      public void setUseHeaderLength​(boolean useHeaderLength)
      Shall this rule be used as a midfetch rule? If true, this rule will determine content length based on HTTP header information, otherwise the size of the already downloaded content will be used.
    • getContentLengthThreshold

      public long getContentLengthThreshold()
    • setContentLengthThreshold

      public void setContentLengthThreshold​(long threshold)
      Max content-length this filter will allow to pass through. If -1, then no limit.
    • evaluate

      protected boolean evaluate​(CrawlURI curi)
      Specified by:
      evaluate in class PredicatedDecideRule
    • test

      protected boolean test​(int contentlength)