Class VespaSimpleJsonInputFormat


  • public class VespaSimpleJsonInputFormat
    extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,​org.apache.hadoop.io.NullWritable>
    Simple JSON reader which splits the input file along JSON object boundaries. There are two cases handled here: 1. Each line contains a JSON object, i.e. { ... } 2. The file contains an array of objects with arbitrary line breaks, i.e. [ {...}, {...} ] Not suitable for cases where you want to extract objects from some other arbitrary structure. TODO: Support config which points to a array in the JSON as start point for object extraction, ala how it is done in VespaHttpClient.parseResultJson, i.e. support rootNode config.
    Author:
    lesters
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  VespaSimpleJsonInputFormat.VespaJsonRecordReader  
      • Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

        org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter
    • Field Summary

      • Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

        DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,​org.apache.hadoop.io.NullWritable> createRecordReader​(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)  
      • Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

        addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, listStatus, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • VespaSimpleJsonInputFormat

        public VespaSimpleJsonInputFormat()
    • Method Detail

      • createRecordReader

        public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,​org.apache.hadoop.io.NullWritable> createRecordReader​(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                                                              org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                                                                       throws java.io.IOException,
                                                                                                                                              java.lang.InterruptedException
        Specified by:
        createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,​org.apache.hadoop.io.NullWritable>
        Throws:
        java.io.IOException
        java.lang.InterruptedException