public class MirrorWriterProcessor extends Processor
There are a number of issues involved:
There would normally be a single instance of this class per Heritrix instance. This class is thread-safe; any number of threads can be in its innerProcess method at once. However, conflicts can still arise in the file system. For example, if several threads try to create the same directory at the same time, only one can win. Therefore, there should be at most one access to a server at a given time.
Modifier and Type | Field and Description |
---|---|
static String |
A_MIRROR_PATH |
protected boolean |
caseSensitiveFilesystem
True if the file system is case-sensitive, like UNIX.
|
protected List<String> |
characterMap
This list is grouped in pairs.
|
protected List<String> |
contentTypeMap
This list is grouped in pairs.
|
protected boolean |
createHostDirectory
Create a subdirectory named for the host in the URI.
|
protected boolean |
createPortDirectory
Create a subdirectory named for the port in the URI.
|
protected String |
directoryFile
Implicitly append this to a URI ending with '/'.
|
protected String |
dotBegin
If a segment starts with '.', the '.' is replaced by this.
|
protected String |
dotEnd
If a directory name ends with '.' it is replaced by this.
|
protected List<String> |
hostMap
This list is grouped in pairs.
|
protected int |
maxPathLength
Maximum file system path length.
|
protected int |
maxSegLength
Maximum file system path segment length.
|
protected ConfigPath |
path
Top-level directory for mirror files.
|
protected boolean |
suffixAtEnd
If true, the suffix is placed at the end of the path, after the query (if
any).
|
protected String |
tooLongDirectory
If all the directories in the URI would exceed, or come close to
exceeding, the file system maximum path length, then they are all
replaced by this.
|
protected List<String> |
underscoreSet
If a directory name appears (case-insensitive) in this list then an
underscore is placed before it.
|
Constructor and Description |
---|
MirrorWriterProcessor() |
doCheckpoint, finishCheckpoint, flattenVia, fromCheckpointJson, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, innerProcessResult, innerRejectProcess, isRunning, isSuccess, process, report, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, start, startCheckpoint, stop, toCheckpointJson
public static final String A_MIRROR_PATH
protected boolean caseSensitiveFilesystem
protected List<String> characterMap
protected List<String> contentTypeMap
protected String dotBegin
protected String dotEnd
protected String directoryFile
protected boolean createHostDirectory
protected List<String> hostMap
protected int maxPathLength
protected int maxSegLength
protected ConfigPath path
protected boolean createPortDirectory
protected boolean suffixAtEnd
protected String tooLongDirectory
protected List<String> underscoreSet
public boolean getCaseSensitiveFilesystem()
public void setCaseSensitiveFilesystem(boolean sensitive)
public String getDotBegin()
public void setDotBegin(String s)
public String getDotEnd()
public void setDotEnd(String s)
public String getDirectoryFile()
public void setDirectoryFile(String s)
public boolean getCreateHostDirectory()
public void setCreateHostDirectory(boolean hostDir)
public int getMaxPathLength()
public void setMaxPathLength(int max)
public int getMaxSegLength()
public void setMaxSegLength(int max)
public ConfigPath getPath()
public void setPath(ConfigPath s)
public boolean getCreatePortDirectory()
public void setCreatePortDirectory(boolean portDir)
public boolean getSuffixAtEnd()
public void setSuffixAtEnd(boolean suffixAtEnd)
public String getTooLongDirectory()
public void setTooLongDirectory(String s)
protected boolean shouldProcess(CrawlURI curi)
Processor
shouldProcess
in class Processor
curi
- the URI to testprotected void innerProcess(CrawlURI curi)
Processor
Processor.getEnabled()
, the
Processor.getShouldProcessRule()
and the Processor.shouldProcess(CrawlURI)
tests.innerProcess
in class Processor
curi
- the URI to processCopyright © 2003–2019 Internet Archive. All rights reserved.