Class RegexSequenceRecordReader
- java.lang.Object
-
- org.datavec.api.records.reader.BaseRecordReader
-
- org.datavec.api.records.reader.impl.FileRecordReader
-
- org.datavec.api.records.reader.impl.regex.RegexSequenceRecordReader
-
- All Implemented Interfaces:
Closeable
,Serializable
,AutoCloseable
,Configurable
,RecordReader
,SequenceRecordReader
public class RegexSequenceRecordReader extends FileRecordReader implements SequenceRecordReader
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
RegexSequenceRecordReader.LineErrorHandling
Error handling mode: How should invalid lines (i.e., those that don't match the provided regex) be handled?
FailOnInvalid: Throw an IllegalStateException when an invalid line is found
SkipInvalid: Skip invalid lines (quietly, with no warning)
SkipInvalidWithWarning: Skip invalid lines, but log a warning
-
Field Summary
Fields Modifier and Type Field Description static Charset
DEFAULT_CHARSET
static RegexSequenceRecordReader.LineErrorHandling
DEFAULT_ERROR_HANDLING
static org.slf4j.Logger
LOG
static String
SKIP_NUM_LINES
-
Fields inherited from class org.datavec.api.records.reader.impl.FileRecordReader
appendLabel, conf, currentUri, labels, locationsIterator
-
Fields inherited from class org.datavec.api.records.reader.BaseRecordReader
inputSplit, listeners, streamCreatorFn
-
Fields inherited from interface org.datavec.api.records.reader.RecordReader
APPEND_LABEL, LABELS, NAME_SPACE
-
-
Constructor Summary
Constructors Constructor Description RegexSequenceRecordReader(String regex, int skipNumLines)
RegexSequenceRecordReader(String regex, int skipNumLines, Charset encoding, RegexSequenceRecordReader.LineErrorHandling errorHandling)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
initialize(Configuration conf, InputSplit split)
Called once at initialization.List<SequenceRecord>
loadSequenceFromMetaData(List<RecordMetaData> recordMetaDatas)
Load multiple sequence records from the given a list ofRecordMetaData
instancesSequenceRecord
loadSequenceFromMetaData(RecordMetaData recordMetaData)
Load a single sequence record from the givenRecordMetaData
instance
Note: that for data that isn't splittable (i.e., text data that needs to be scanned/split), it is more efficient to load multiple records at once usingSequenceRecordReader.loadSequenceFromMetaData(List)
SequenceRecord
nextSequence()
Similar toSequenceRecordReader.sequenceRecord()
, but returns aRecord
object, that may include metadata such as the source of the datavoid
reset()
Reset record reader iteratorList<List<Writable>>
sequenceRecord()
Returns a sequence record.List<List<Writable>>
sequenceRecord(URI uri, DataInputStream dataInputStream)
Load a sequence record from the given DataInputStream UnlikeRecordReader.next()
the internal state of the RecordReader is not modified Implementations of this method should not close the DataInputStream-
Methods inherited from class org.datavec.api.records.reader.impl.FileRecordReader
close, doInitialize, getConf, getCurrentLabel, getLabel, getLabels, hasNext, initialize, loadFromMetaData, loadFromMetaData, next, next, nextRecord, record, resetSupported, setConf, setLabels
-
Methods inherited from class org.datavec.api.records.reader.BaseRecordReader
batchesSupported, getListeners, invokeListeners, setListeners, setListeners
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.datavec.api.conf.Configurable
getConf, setConf
-
Methods inherited from interface org.datavec.api.records.reader.RecordReader
batchesSupported, getLabels, getListeners, hasNext, initialize, loadFromMetaData, loadFromMetaData, next, next, nextRecord, record, resetSupported, setListeners, setListeners
-
-
-
-
Field Detail
-
SKIP_NUM_LINES
public static final String SKIP_NUM_LINES
-
DEFAULT_CHARSET
public static final Charset DEFAULT_CHARSET
-
DEFAULT_ERROR_HANDLING
public static final RegexSequenceRecordReader.LineErrorHandling DEFAULT_ERROR_HANDLING
-
LOG
public static final org.slf4j.Logger LOG
-
-
Constructor Detail
-
RegexSequenceRecordReader
public RegexSequenceRecordReader(String regex, int skipNumLines)
-
RegexSequenceRecordReader
public RegexSequenceRecordReader(String regex, int skipNumLines, Charset encoding, RegexSequenceRecordReader.LineErrorHandling errorHandling)
-
-
Method Detail
-
initialize
public void initialize(Configuration conf, InputSplit split) throws IOException, InterruptedException
Description copied from interface:RecordReader
Called once at initialization.- Specified by:
initialize
in interfaceRecordReader
- Overrides:
initialize
in classFileRecordReader
- Parameters:
conf
- a configuration for initializationsplit
- the split that defines the range of records to read- Throws:
IOException
InterruptedException
-
sequenceRecord
public List<List<Writable>> sequenceRecord()
Description copied from interface:SequenceRecordReader
Returns a sequence record.- Specified by:
sequenceRecord
in interfaceSequenceRecordReader
- Returns:
- a sequence of records
-
sequenceRecord
public List<List<Writable>> sequenceRecord(URI uri, DataInputStream dataInputStream) throws IOException
Description copied from interface:SequenceRecordReader
Load a sequence record from the given DataInputStream UnlikeRecordReader.next()
the internal state of the RecordReader is not modified Implementations of this method should not close the DataInputStream- Specified by:
sequenceRecord
in interfaceSequenceRecordReader
- Throws:
IOException
- if error occurs during reading from the input stream
-
reset
public void reset()
Description copied from interface:RecordReader
Reset record reader iterator- Specified by:
reset
in interfaceRecordReader
- Overrides:
reset
in classFileRecordReader
-
nextSequence
public SequenceRecord nextSequence()
Description copied from interface:SequenceRecordReader
Similar toSequenceRecordReader.sequenceRecord()
, but returns aRecord
object, that may include metadata such as the source of the data- Specified by:
nextSequence
in interfaceSequenceRecordReader
- Returns:
- next sequence record
-
loadSequenceFromMetaData
public SequenceRecord loadSequenceFromMetaData(RecordMetaData recordMetaData) throws IOException
Description copied from interface:SequenceRecordReader
Load a single sequence record from the givenRecordMetaData
instance
Note: that for data that isn't splittable (i.e., text data that needs to be scanned/split), it is more efficient to load multiple records at once usingSequenceRecordReader.loadSequenceFromMetaData(List)
- Specified by:
loadSequenceFromMetaData
in interfaceSequenceRecordReader
- Parameters:
recordMetaData
- Metadata for the sequence record that we want to load from- Returns:
- Single sequence record for the given RecordMetaData instance
- Throws:
IOException
- If I/O error occurs during loading
-
loadSequenceFromMetaData
public List<SequenceRecord> loadSequenceFromMetaData(List<RecordMetaData> recordMetaDatas) throws IOException
Description copied from interface:SequenceRecordReader
Load multiple sequence records from the given a list ofRecordMetaData
instances- Specified by:
loadSequenceFromMetaData
in interfaceSequenceRecordReader
- Parameters:
recordMetaDatas
- Metadata for the records that we want to load from- Returns:
- Multiple sequence record for the given RecordMetaData instances
- Throws:
IOException
- If I/O error occurs during loading
-
-