java.lang.Object
- org.apache.hadoop.mapreduce.InputFormat<K,V>
- - org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Void,T>
  - - org.apache.parquet.hadoop.ParquetInputFormat<T>

Type Parameters:

T - the type of the materialized records

Direct Known Subclasses:

ExampleInputFormat
```
public class ParquetInputFormat<T>
extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Void,T>
```
The input format to read a Parquet file. It requires an implementation of ReadSupport to materialize the records. The requestedSchema will control how the original records get projected by the loader. It must be a subset of the original schema. Only the columns needed to reconstruct the records with the requestedSchema will be scanned.

See Also:

READ_SUPPORT_CLASS, UNBOUND_RECORD_FILTER, STRICT_TYPE_CHECKING, FILTER_PREDICATE, TASK_SIDE_METADATA

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
  org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter

Field Summary

Fields
Modifier and Type	Field	Description
`static String`	`BLOOM_FILTERING_ENABLED`	key to configure whether row group bloom filtering is enabled
`static String`	`COLUMN_INDEX_FILTERING_ENABLED`	key to configure whether column index filtering of pages is enabled
`static String`	`DICTIONARY_FILTERING_ENABLED`	key to configure whether row group dictionary filtering is enabled
`static String`	`FILTER_PREDICATE`	key to configure the filter predicate
`static String`	`PAGE_VERIFY_CHECKSUM_ENABLED`	key to configure whether page level checksum verification is enabled
`static String`	`READ_SUPPORT_CLASS`	key to configure the ReadSupport implementation
`static String`	`RECORD_FILTERING_ENABLED`	key to configure whether record-level filtering is enabled
`static String`	`SPLIT_FILES`	key to turn off file splitting.
`static String`	`STATS_FILTERING_ENABLED`	key to configure whether row group stats filtering is enabled
`static String`	`STRICT_TYPE_CHECKING`	key to configure type checking for conflicting schemas (default: true)
`static String`	`TASK_SIDE_METADATA`	key to turn on or off task side metadata loading (default true) if true then metadata is read on the task side and some tasks may finish immediately.
`static String`	`UNBOUND_RECORD_FILTER`	key to configure the filter

Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE

Constructor Summary

Constructors
Constructor	Description
`ParquetInputFormat()`	Hadoop will instantiate using this constructor
`ParquetInputFormat(Class<S> readSupportClass)`	Constructor for subclasses, such as AvroParquetInputFormat, or wrappers.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method	Description
`org.apache.hadoop.mapreduce.RecordReader<Void,T>`	`createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)`
`static org.apache.parquet.filter2.compat.FilterCompat.Filter`	`getFilter(org.apache.hadoop.conf.Configuration conf)`	Returns a non-null Filter, which is a wrapper around either a FilterPredicate, an UnboundRecordFilter, or a no-op filter.
`List<Footer>`	`getFooters(org.apache.hadoop.conf.Configuration configuration, Collection<org.apache.hadoop.fs.FileStatus> statuses)`	the footers for the files
`List<Footer>`	`getFooters(org.apache.hadoop.conf.Configuration configuration, List<org.apache.hadoop.fs.FileStatus> statuses)`
`List<Footer>`	`getFooters(org.apache.hadoop.mapreduce.JobContext jobContext)`
`GlobalMetaData`	`getGlobalMetaData(org.apache.hadoop.mapreduce.JobContext jobContext)`
`static Class<?>`	`getReadSupportClass(org.apache.hadoop.conf.Configuration configuration)`
`static <T> ReadSupport<T>`	`getReadSupportInstance(org.apache.hadoop.conf.Configuration configuration)`
`List<ParquetInputSplit>`	`getSplits(org.apache.hadoop.conf.Configuration configuration, List<Footer> footers)`	Deprecated. split planning using file footers will be removed
`List<org.apache.hadoop.mapreduce.InputSplit>`	`getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)`
`static Class<?>`	`getUnboundRecordFilter(org.apache.hadoop.conf.Configuration configuration)`	Deprecated. use `getFilter(Configuration)`
`protected boolean`	`isSplitable(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.fs.Path filename)`
`static boolean`	`isTaskSideMetaData(org.apache.hadoop.conf.Configuration configuration)`
`protected List<org.apache.hadoop.fs.FileStatus>`	`listStatus(org.apache.hadoop.mapreduce.JobContext jobContext)`
`static void`	`setFilterPredicate(org.apache.hadoop.conf.Configuration configuration, org.apache.parquet.filter2.predicate.FilterPredicate filterPredicate)`
`static void`	`setReadSupportClass(org.apache.hadoop.mapred.JobConf conf, Class<?> readSupportClass)`
`static void`	`setReadSupportClass(org.apache.hadoop.mapreduce.Job job, Class<?> readSupportClass)`
`static void`	`setTaskSideMetaData(org.apache.hadoop.mapreduce.Job job, boolean taskSideMetadata)`
`static void`	`setUnboundRecordFilter(org.apache.hadoop.mapreduce.Job job, Class<? extends org.apache.parquet.filter.UnboundRecordFilter> filterClass)`

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail
- READ_SUPPORT_CLASS
```
public static final String READ_SUPPORT_CLASS
```
  key to configure the ReadSupport implementation
  
  See Also:
  
  Constant Field Values
- UNBOUND_RECORD_FILTER
```
public static final String UNBOUND_RECORD_FILTER
```
  key to configure the filter
  
  See Also:
  
  Constant Field Values
- STRICT_TYPE_CHECKING
```
public static final String STRICT_TYPE_CHECKING
```
  key to configure type checking for conflicting schemas (default: true)
  
  See Also:
  
  Constant Field Values
- FILTER_PREDICATE
```
public static final String FILTER_PREDICATE
```
  key to configure the filter predicate
  
  See Also:
  
  Constant Field Values
- RECORD_FILTERING_ENABLED
```
public static final String RECORD_FILTERING_ENABLED
```
  key to configure whether record-level filtering is enabled
  
  See Also:
  
  Constant Field Values
- STATS_FILTERING_ENABLED
```
public static final String STATS_FILTERING_ENABLED
```
  key to configure whether row group stats filtering is enabled
  
  See Also:
  
  Constant Field Values
- DICTIONARY_FILTERING_ENABLED
```
public static final String DICTIONARY_FILTERING_ENABLED
```
  key to configure whether row group dictionary filtering is enabled
  
  See Also:
  
  Constant Field Values
- COLUMN_INDEX_FILTERING_ENABLED
```
public static final String COLUMN_INDEX_FILTERING_ENABLED
```
  key to configure whether column index filtering of pages is enabled
  
  See Also:
  
  Constant Field Values
- PAGE_VERIFY_CHECKSUM_ENABLED
```
public static final String PAGE_VERIFY_CHECKSUM_ENABLED
```
  key to configure whether page level checksum verification is enabled
  
  See Also:
  
  Constant Field Values
- BLOOM_FILTERING_ENABLED
```
public static final String BLOOM_FILTERING_ENABLED
```
  key to configure whether row group bloom filtering is enabled
  
  See Also:
  
  Constant Field Values
- TASK_SIDE_METADATA
```
public static final String TASK_SIDE_METADATA
```
  key to turn on or off task side metadata loading (default true) if true then metadata is read on the task side and some tasks may finish immediately. if false metadata is read on the client which is slower if there is a lot of metadata but tasks will only be spawn if there is work to do.
  
  See Also:
  
  Constant Field Values
- SPLIT_FILES
```
public static final String SPLIT_FILES
```
  key to turn off file splitting. See PARQUET-246.
  
  See Also:
  
  Constant Field Values

Constructor Detail
- ParquetInputFormat
```
public ParquetInputFormat()
```
  Hadoop will instantiate using this constructor
- ParquetInputFormat
```
public ParquetInputFormat(Class<S> readSupportClass)
```
  Constructor for subclasses, such as AvroParquetInputFormat, or wrappers.
  Subclasses and wrappers may use this constructor to set the ReadSupport class that will be used when reading instead of requiring the user to set the read support property in their configuration.
  
  Type Parameters:
  
  S - the Java read support type
  
  Parameters:
  
  readSupportClass - a ReadSupport subclass

Method Detail

setTaskSideMetaData

public static void setTaskSideMetaData(org.apache.hadoop.mapreduce.Job job,
                                       boolean taskSideMetadata)

isTaskSideMetaData

public static boolean isTaskSideMetaData(org.apache.hadoop.conf.Configuration configuration)

setReadSupportClass

public static void setReadSupportClass(org.apache.hadoop.mapreduce.Job job,
                                       Class<?> readSupportClass)

setUnboundRecordFilter

public static void setUnboundRecordFilter(org.apache.hadoop.mapreduce.Job job,
                                          Class<? extends org.apache.parquet.filter.UnboundRecordFilter> filterClass)

getUnboundRecordFilter
```
@Deprecated
public static Class<?> getUnboundRecordFilter(org.apache.hadoop.conf.Configuration configuration)
```
Deprecated.
use getFilter(Configuration)

Parameters:

configuration - a configuration

Returns:

an unbound record filter class

setReadSupportClass

public static void setReadSupportClass(org.apache.hadoop.mapred.JobConf conf,
                                       Class<?> readSupportClass)

getReadSupportClass

public static Class<?> getReadSupportClass(org.apache.hadoop.conf.Configuration configuration)

setFilterPredicate

public static void setFilterPredicate(org.apache.hadoop.conf.Configuration configuration,
                                      org.apache.parquet.filter2.predicate.FilterPredicate filterPredicate)

getFilter
```
public static org.apache.parquet.filter2.compat.FilterCompat.Filter getFilter(org.apache.hadoop.conf.Configuration conf)
```
Returns a non-null Filter, which is a wrapper around either a FilterPredicate, an UnboundRecordFilter, or a no-op filter.

Parameters:

conf - a configuration

Returns:

a filter for the unbound record filter specified in conf

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<Void,T> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
                                                                                 org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
                                                                          throws IOException,
                                                                                 InterruptedException

Specified by:: createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<Void,T>
Throws:: IOException; InterruptedException

getReadSupportInstance
```
public static <T> ReadSupport<T> getReadSupportInstance(org.apache.hadoop.conf.Configuration configuration)
```
Type Parameters:

T - the Java type of objects created by the ReadSupport

Parameters:

configuration - to find the configuration for the read support

Returns:

the configured read support

isSplitable

protected boolean isSplitable(org.apache.hadoop.mapreduce.JobContext context,
                              org.apache.hadoop.fs.Path filename)

Overrides:: isSplitable in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Void,T>

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)
                                                       throws IOException

Overrides:: getSplits in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Void,T>
Throws:: IOException

getSplits

@Deprecated
public List<ParquetInputSplit> getSplits(org.apache.hadoop.conf.Configuration configuration,
                                         List<Footer> footers)
                                  throws IOException

Deprecated.

split planning using file footers will be removed

Parameters:: configuration - the configuration to connect to the file system; footers - the footers of the files to read
Returns:: the splits for the footers
Throws:: IOException - if there is an error while reading

listStatus

protected List<org.apache.hadoop.fs.FileStatus> listStatus(org.apache.hadoop.mapreduce.JobContext jobContext)
                                                    throws IOException

Overrides:: listStatus in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Void,T>
Throws:: IOException

getFooters
```
public List<Footer> getFooters(org.apache.hadoop.mapreduce.JobContext jobContext)
                        throws IOException
```
Parameters:

jobContext - the current job context

Returns:

the footers for the files

Throws:

IOException - if there is an error while reading

getFooters

public List<Footer> getFooters(org.apache.hadoop.conf.Configuration configuration,
                               List<org.apache.hadoop.fs.FileStatus> statuses)
                        throws IOException

Throws:: IOException

getFooters

public List<Footer> getFooters(org.apache.hadoop.conf.Configuration configuration,
                               Collection<org.apache.hadoop.fs.FileStatus> statuses)
                        throws IOException

the footers for the files

Parameters:: configuration - to connect to the file system; statuses - the files to open
Returns:: the footers of the files
Throws:: IOException - if there is an error while reading

getGlobalMetaData
```
public GlobalMetaData getGlobalMetaData(org.apache.hadoop.mapreduce.JobContext jobContext)
                                 throws IOException
```
Parameters:

jobContext - the current job context

Returns:

the merged metadata from the footers

Throws:

IOException - if there is an error while reading

Class ParquetInputFormat<T>

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Methods inherited from class java.lang.Object

Field Detail

READ_SUPPORT_CLASS

UNBOUND_RECORD_FILTER

STRICT_TYPE_CHECKING

FILTER_PREDICATE

RECORD_FILTERING_ENABLED

STATS_FILTERING_ENABLED

DICTIONARY_FILTERING_ENABLED

COLUMN_INDEX_FILTERING_ENABLED

PAGE_VERIFY_CHECKSUM_ENABLED

BLOOM_FILTERING_ENABLED

TASK_SIDE_METADATA

SPLIT_FILES

Constructor Detail

ParquetInputFormat

ParquetInputFormat

Method Detail

setTaskSideMetaData

isTaskSideMetaData

setReadSupportClass

setUnboundRecordFilter

getUnboundRecordFilter

setReadSupportClass

getReadSupportClass

setFilterPredicate

getFilter

createRecordReader

getReadSupportInstance

isSplitable

getSplits

getSplits

listStatus

getFooters

getFooters

getFooters

getGlobalMetaData