All Classes and Interfaces

Class
Description
 
 
 
 
ApplicationEvent published when Heritrix sends a URL to AMQP.
ApplicationEvent published when AMQPUrlReceiver receives a URL.
 
Bean to enforce a wait for Umbra's amqp queue
 
 
Example usage:
PDF Content Extractor.
 
Extracts links to media by running yt-dlp in a subprocess.
 
Youtube stream URI extractor.
collection of utility methods useful for loading and storing crawl history.
Enforces quotas on a host.
For Kafka 0.8.x.
A subclass of ExtractorJS that has some customized behavior for specific kinds of web pages.
Wraps a CrawlURI, allowing baseURI to be overridden, without changing the underlying CrawlURI.
Processor for enforcing quotas by source tag (normally the seed url if enabled).
 
AbstractContentDigestHistory implementation for trough.
Post insert statements for these two tables.
 
A Processor for retrieving recrawl info from remote Wayback Machine index.