public class WARCWriterChainProcessor extends BaseWARCWriterProcessor implements HasKeyedProperties
WARCRecordBuilder
implementations (see setChain(List)
).
This is the default chain:
<property name="chain"> <list> <bean class="org.archive.modules.warc.DnsResponseRecordBuilder"/> <bean class="org.archive.modules.warc.HttpResponseRecordBuilder"/> <bean class="org.archive.modules.warc.WhoisResponseRecordBuilder"/> <bean class="org.archive.modules.warc.FtpControlConversationRecordBuilder"/> <bean class="org.archive.modules.warc.FtpResponseRecordBuilder"/> <bean class="org.archive.modules.warc.RevisitRecordBuilder"/> <bean class="org.archive.modules.warc.HttpRequestRecordBuilder"/> <bean class="org.archive.modules.warc.MetadataRecordBuilder"/> </list> </property>
Replaces WARCWriterProcessor
.
WARCRecordBuilder
generator, stats, urlsWritten
ANNOTATION_UNWRITTEN, compress, directory, frequentFlushes, maxFileSizeBytes, maxTotalBytesToWrite, maxWaitForIdleMs, poolMaxActive, prefix, serverCache, skipIdenticalDigests, startNewFilesOnCheckpoint, storePaths, template, writeBufferSize
Constructor and Description |
---|
WARCWriterChainProcessor() |
Modifier and Type | Method and Description |
---|---|
List<? extends WARCRecordBuilder> |
getChain() |
protected ProcessResult |
innerProcessResult(CrawlURI curi) |
void |
setChain(List<? extends WARCRecordBuilder> chain) |
protected boolean |
shouldWrite(CrawlURI curi)
Whether the given CrawlURI should be written to archive files.
|
protected ProcessResult |
write(CrawlURI curi) |
protected void |
writeRecords(CrawlURI curi,
org.archive.io.warc.WARCWriter writer) |
addIfNotBlank, addStats, copyStats, getDefaultMaxFileSize, getDefaultStorePaths, getMetadata, getRecordID, getRecordIDGenerator, getStats, report, setRecordIDGenerator, setupPool, updateMetadataAfterWrite
calcOutputDirs, checkBytesWritten, copyForwardWriteTagIfDupe, doCheckpoint, fromCheckpointJson, getCompress, getDirectory, getFrequentFlushes, getHostAddress, getMaxFileSizeBytes, getMaxTotalBytesToWrite, getMaxWaitForIdleMs, getMetadataProvider, getPool, getPoolMaxActive, getPrefix, getSerialNo, getServerCache, getSkipIdenticalDigests, getStartNewFilesOnCheckpoint, getStorePaths, getTemplate, getTotalBytesWritten, getWriteBufferSize, innerProcess, innerRejectProcess, setCompress, setDirectory, setFrequentFlushes, setMaxFileSizeBytes, setMaxTotalBytesToWrite, setMaxWaitForIdleMs, setMetadataProvider, setPool, setPoolMaxActive, setPrefix, setServerCache, setSkipIdenticalDigests, setStartNewFilesOnCheckpoint, setStorePaths, setTemplate, setTotalBytesWritten, setWriteBufferSize, shouldProcess, start, stop, toCheckpointJson
finishCheckpoint, flattenVia, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, isRunning, isSuccess, process, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, startCheckpoint
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getKeyedProperties
calcOutputDirs, getCompress, getFrequentFlushes, getMaxFileSizeBytes, getPrefix, getTemplate, getWriteBufferSize
finishCheckpoint, setRecoveryCheckpoint, startCheckpoint
public List<? extends WARCRecordBuilder> getChain()
public void setChain(List<? extends WARCRecordBuilder> chain)
protected boolean shouldWrite(CrawlURI curi)
WriterPoolProcessor
shouldWrite
in class WriterPoolProcessor
curi
- CrawlURIprotected ProcessResult innerProcessResult(CrawlURI curi)
innerProcessResult
in class WriterPoolProcessor
protected ProcessResult write(CrawlURI curi) throws IOException
IOException
protected void writeRecords(CrawlURI curi, org.archive.io.warc.WARCWriter writer) throws IOException
IOException
Copyright © 2003–2020 Internet Archive. All rights reserved.