AbstractBigQueryInputFormat (bigquery-connector-hadoop3 hadoop3-1.1.1 API)

java.lang.Object
- org.apache.hadoop.mapreduce.InputFormat<K,V>
- - com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat<K,V>

Type Parameters:

K - Key type

V - Value type

All Implemented Interfaces:

DelegateRecordReaderFactory<K,V>

Direct Known Subclasses:

AvroBigQueryInputFormat, GsonBigQueryInputFormat, JsonTextBigQueryInputFormat
```
public abstract class AbstractBigQueryInputFormat<K,V>
extends org.apache.hadoop.mapreduce.InputFormat<K,V>
implements DelegateRecordReaderFactory<K,V>
```
Abstract base class for BigQuery input formats. This class is expected to take care of performing BigQuery exports to temporary tables, BigQuery exports to GCS and cleaning up any files or tables that either of those processes create.

Field Summary

Fields
Modifier and Type	Field and Description
`static String`	`EXTERNAL_TABLE_TYPE` The keyword for the type of BigQueryTable store externally.
`static HadoopConfigurationProperty<Class<?>>`	`INPUT_FORMAT_CLASS` Configuration key for InputFormat class name.

Constructor Summary

Constructors
Constructor and Description

AbstractBigQueryInputFormat()

Constructors
Constructor and Description
`AbstractBigQueryInputFormat()`

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`static void`	`cleanupJob(BigQueryHelper bigQueryHelper, org.apache.hadoop.conf.Configuration config)` Similar to `cleanupJob(Configuration, JobID)`, but allows specifying the Bigquery instance to use.
`static void`	`cleanupJob(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.mapreduce.JobID jobId)` Cleans up relevant temporary resources associated with a job which used the GsonBigQueryInputFormat; this should be called explicitly after the completion of the entire job.
`org.apache.hadoop.mapreduce.RecordReader<K,V>`	`createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.conf.Configuration configuration)`
`org.apache.hadoop.mapreduce.RecordReader<K,V>`	`createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)`
`protected com.google.api.services.bigquery.Bigquery`	`getBigQuery(org.apache.hadoop.conf.Configuration config)` Helper method to override for testing.
`protected BigQueryHelper`	`getBigQueryHelper(org.apache.hadoop.conf.Configuration config)` Helper method to override for testing.
`abstract ExportFileFormat`	`getExportFileFormat()` Get the ExportFileFormat that this input format supports.
`protected static ExportFileFormat`	`getExportFileFormat(Class<? extends AbstractBigQueryInputFormat<?,?>> clazz)`
`protected static ExportFileFormat`	`getExportFileFormat(org.apache.hadoop.conf.Configuration configuration)`
`List<org.apache.hadoop.mapreduce.InputSplit>`	`getSplits(org.apache.hadoop.mapreduce.JobContext context)`
`static void`	`setInputTable(org.apache.hadoop.conf.Configuration configuration, String projectId, String datasetId, String tableId)` Configure the BigQuery input table for a job
`static void`	`setInputTable(org.apache.hadoop.conf.Configuration configuration, com.google.api.services.bigquery.model.TableReference tableReference)` Configure the BigQuery input table for a job
`static void`	`setTemporaryCloudStorageDirectory(org.apache.hadoop.conf.Configuration configuration, String path)` Configure a directory to which we will export BigQuery data

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface com.google.cloud.hadoop.io.bigquery.DelegateRecordReaderFactory
createDelegateRecordReader

Field Detail
- INPUT_FORMAT_CLASS
```
public static final HadoopConfigurationProperty<Class<?>> INPUT_FORMAT_CLASS
```
  Configuration key for InputFormat class name.
- EXTERNAL_TABLE_TYPE
```
public static final String EXTERNAL_TABLE_TYPE
```
  The keyword for the type of BigQueryTable store externally.
  
  See Also:
  
  Constant Field Values

Constructor Detail
- AbstractBigQueryInputFormat
```
public AbstractBigQueryInputFormat()
```

Method Detail

setInputTable

public static void setInputTable(org.apache.hadoop.conf.Configuration configuration,
                                 String projectId,
                                 String datasetId,
                                 String tableId)
                          throws IOException

Configure the BigQuery input table for a job

Throws:: IOException

setInputTable

public static void setInputTable(org.apache.hadoop.conf.Configuration configuration,
                                 com.google.api.services.bigquery.model.TableReference tableReference)
                          throws IOException

Configure the BigQuery input table for a job

Throws:: IOException

setTemporaryCloudStorageDirectory

public static void setTemporaryCloudStorageDirectory(org.apache.hadoop.conf.Configuration configuration,
                                                     String path)

Configure a directory to which we will export BigQuery data

getExportFileFormat
```
public abstract ExportFileFormat getExportFileFormat()
```
Get the ExportFileFormat that this input format supports.

getExportFileFormat

protected static ExportFileFormat getExportFileFormat(org.apache.hadoop.conf.Configuration configuration)

getExportFileFormat

protected static ExportFileFormat getExportFileFormat(Class<? extends AbstractBigQueryInputFormat<?,?>> clazz)

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
                                                       throws IOException,
                                                              InterruptedException

Specified by:: getSplits in class org.apache.hadoop.mapreduce.InputFormat<K,V>
Throws:: IOException; InterruptedException

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<K,V> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
                                                                        org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
                                                                 throws IOException,
                                                                        InterruptedException

Specified by:: createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<K,V>
Throws:: IOException; InterruptedException

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<K,V> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
                                                                        org.apache.hadoop.conf.Configuration configuration)
                                                                 throws IOException,
                                                                        InterruptedException

Throws:: IOException; InterruptedException

cleanupJob
```
public static void cleanupJob(org.apache.hadoop.conf.Configuration configuration,
                              org.apache.hadoop.mapreduce.JobID jobId)
                       throws IOException
```
Cleans up relevant temporary resources associated with a job which used the GsonBigQueryInputFormat; this should be called explicitly after the completion of the entire job. Possibly cleans up intermediate export tables if configured to use one due to specifying a BigQuery "query" for the input. Cleans up the GCS directoriy where BigQuery exported its files for reading.

Throws:

IOException

cleanupJob
```
public static void cleanupJob(BigQueryHelper bigQueryHelper,
                              org.apache.hadoop.conf.Configuration config)
                       throws IOException
```
Similar to cleanupJob(Configuration, JobID), but allows specifying the Bigquery instance to use.

Parameters:

bigQueryHelper - The Bigquery API-client helper instance to use.

config - The job Configuration object which contains settings such as whether sharded export was enabled, which GCS directory the export was performed in, etc.

Throws:

IOException

getBigQuery

protected com.google.api.services.bigquery.Bigquery getBigQuery(org.apache.hadoop.conf.Configuration config)
                                                         throws GeneralSecurityException,
                                                                IOException

Helper method to override for testing.

Returns:: Bigquery.
Throws:: IOException - on IO Error.; GeneralSecurityException - on security exception.

getBigQueryHelper

protected BigQueryHelper getBigQueryHelper(org.apache.hadoop.conf.Configuration config)
                                    throws GeneralSecurityException,
                                           IOException

Helper method to override for testing.

Throws:: GeneralSecurityException; IOException

Class AbstractBigQueryInputFormat<K,V>

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface com.google.cloud.hadoop.io.bigquery.DelegateRecordReaderFactory

Field Detail

INPUT_FORMAT_CLASS

EXTERNAL_TABLE_TYPE

Constructor Detail

AbstractBigQueryInputFormat

Method Detail

setInputTable

setInputTable

setTemporaryCloudStorageDirectory

getExportFileFormat

getExportFileFormat

getExportFileFormat

getSplits

createRecordReader

createRecordReader

cleanupJob

cleanupJob

getBigQuery

getBigQueryHelper