Package org.archive.modules.extractor
package org.archive.modules.extractor
-
ClassesClassDescriptionPDF Content Extractor.Extracts links to media by running yt-dlp in a subprocess.Youtube stream URI extractor.A subclass of
ExtractorJS
that has some customized behavior for specific kinds of web pages.Wraps aCrawlURI
, allowing baseURI to be overridden, without changing the underlying CrawlURI.