Package com.yahoo.vespa.hadoop.mapreduce
Class VespaSimpleJsonInputFormat
- java.lang.Object
-
- org.apache.hadoop.mapreduce.InputFormat<K,V>
-
- org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable>
-
- com.yahoo.vespa.hadoop.mapreduce.VespaSimpleJsonInputFormat
-
public class VespaSimpleJsonInputFormat extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable>
Simple JSON reader which splits the input file along JSON object boundaries. There are two cases handled here: 1. Each line contains a JSON object, i.e. { ... } 2. The file contains an array of objects with arbitrary line breaks, i.e. [ {...}, {...} ] Not suitable for cases where you want to extract objects from some other arbitrary structure. TODO: Support config which points to a array in the JSON as start point for object extraction, ala how it is done in VespaHttpClient.parseResultJson, i.e. support rootNode config.- Author:
- lesters
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
VespaSimpleJsonInputFormat.VespaJsonRecordReader
-
Constructor Summary
Constructors Constructor Description VespaSimpleJsonInputFormat()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable>
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
-
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, listStatus, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
-
-
-
-
Method Detail
-
createRecordReader
public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws java.io.IOException, java.lang.InterruptedException
- Specified by:
createRecordReader
in classorg.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable>
- Throws:
java.io.IOException
java.lang.InterruptedException
-
-