All Classes and Interfaces
Class
Description
ApplicationEvent published when Heritrix sends a URL to AMQP.
ApplicationEvent published when AMQPUrlReceiver receives a URL.
Bean to enforce a wait for Umbra's amqp queue
Example usage:
PDF Content Extractor.
Extracts links to media by running yt-dlp in a subprocess.
Youtube stream URI extractor.
collection of utility methods useful for loading and storing crawl history.
Enforces quotas on a host.
For Kafka 0.8.x.
A subclass of
ExtractorJS
that has some customized behavior for
specific kinds of web pages.Wraps a
CrawlURI
, allowing baseURI to be overridden, without
changing the underlying CrawlURI.Processor for enforcing quotas by source tag (normally the seed url if
enabled).
AbstractContentDigestHistory implementation for trough.
Post insert statements for these two tables.
A
Processor
for retrieving recrawl info from remote Wayback Machine index.