Class VespaSimpleJsonInputFormat

java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable>
com.yahoo.vespa.hadoop.mapreduce.VespaSimpleJsonInputFormat

public class VespaSimpleJsonInputFormat extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable>
Simple JSON reader which splits the input file along JSON object boundaries. There are two cases handled here: 1. Each line contains a JSON object, i.e. { ... } 2. The file contains an array of objects with arbitrary line breaks, i.e. [ {...}, {...} ] Not suitable for cases where you want to extract objects from some other arbitrary structure. TODO: Support config which points to a array in the JSON as start point for object extraction, ala how it is done in VespaHttpClient.parseResultJson, i.e. support rootNode config.
Author:
lesters
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
     

    Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

    org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter
  • Field Summary

    Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

    DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable>
    createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
     

    Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

    addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, listStatus, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • VespaSimpleJsonInputFormat

      public VespaSimpleJsonInputFormat()
  • Method Details

    • createRecordReader

      public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException
      Specified by:
      createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable>
      Throws:
      IOException
      InterruptedException