- checkJobIdEquality(Job, Job) - Method in class com.google.cloud.hadoop.io.bigquery.BigQueryHelper
-
Helper to check for non-null Job.getJobReference().getJobId() and quality of the getJobId()
between expected
and actual
, using Preconditions.checkState.
- checkOutputSpecs(JobContext) - Method in class com.google.cloud.hadoop.io.bigquery.output.ForwardingBigQueryFileOutputFormat
-
Checks to make sure the configuration is valid, the output path doesn't already exist, and that
a connection to BigQuery can be established.
- cleanup(JobContext) - Method in class com.google.cloud.hadoop.io.bigquery.output.ForwardingBigQueryFileOutputCommitter
-
Attempts to manually delete data in the output path.
- cleanupExport() - Method in class com.google.cloud.hadoop.io.bigquery.AbstractExportToCloudStorage
-
- cleanupExport() - Method in interface com.google.cloud.hadoop.io.bigquery.Export
-
Delete any temp tables or temporary data locations.
- cleanupExport() - Method in class com.google.cloud.hadoop.io.bigquery.NoopFederatedExportToCloudStorage
-
- cleanupJob(Configuration, JobID) - Static method in class com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat
-
Cleans up relevant temporary resources associated with a job which used the
GsonBigQueryInputFormat; this should be called explicitly after the completion of the entire
job.
- cleanupJob(BigQueryHelper, Configuration) - Static method in class com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat
-
- close() - Method in class com.google.cloud.hadoop.io.bigquery.AvroRecordReader
-
- close() - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryRecordReader
-
- close() - Method in class com.google.cloud.hadoop.io.bigquery.DynamicFileListRecordReader
-
Closes the record reader.
- close() - Method in class com.google.cloud.hadoop.io.bigquery.GsonRecordReader
-
Closes the record reader.
- com.google.cloud.hadoop.io.bigquery - package com.google.cloud.hadoop.io.bigquery
-
- com.google.cloud.hadoop.io.bigquery.output - package com.google.cloud.hadoop.io.bigquery.output
-
- com.google.cloud.hadoop.io.bigquery.samples - package com.google.cloud.hadoop.io.bigquery.samples
-
- commitJob(JobContext) - Method in class com.google.cloud.hadoop.io.bigquery.output.FederatedBigQueryOutputCommitter
-
Runs a federated import job on BigQuery for the data in the output path in addition to calling
the delegate's commitJob.
- commitJob(JobContext) - Method in class com.google.cloud.hadoop.io.bigquery.output.ForwardingBigQueryFileOutputCommitter
-
Calls the delegate's OutputCommitter.commitJob(JobContext)
.
- commitJob(JobContext) - Method in class com.google.cloud.hadoop.io.bigquery.output.IndirectBigQueryOutputCommitter
-
Runs an import job on BigQuery for the data in the output path in addition to calling the
delegate's commitJob.
- commitTask(TaskAttemptContext) - Method in class com.google.cloud.hadoop.io.bigquery.output.ForwardingBigQueryFileOutputCommitter
-
Calls the delegate's OutputCommitter.commitTask(TaskAttemptContext)
.
- configuration - Variable in class com.google.cloud.hadoop.io.bigquery.AbstractExportToCloudStorage
-
- configure(Configuration, String, String, String, BigQueryFileFormat, Class<? extends FileOutputFormat>) - Static method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryOutputConfiguration
-
A helper function to set the required output keys in the given configuration.
- configure(Configuration, String, BigQueryTableSchema, String, BigQueryFileFormat, Class<? extends FileOutputFormat>) - Static method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryOutputConfiguration
-
A helper function to set the required output keys in the given configuration.
- configureBigQueryInput(Configuration, String, String, String) - Static method in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Sets the Bigquery access related fields in the JobConf for input connector.
- configureBigQueryInput(Configuration, String) - Static method in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Sets the Bigquery access related fields in the JobConf for input connector.
- configureBigQueryOutput(Configuration, String, String, String, String) - Static method in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Sets the Bigquery access related fields in the JobConf for output connector.
- configureBigQueryOutput(Configuration, String, String) - Static method in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Sets the Bigquery access related fields in the JobConf for output connector.
- configureWithAutoSchema(Configuration, String, String, BigQueryFileFormat, Class<? extends FileOutputFormat>) - Static method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryOutputConfiguration
-
A helper function to set the required output keys in the given configuration.
- createBigQueryCredential(Configuration) - Method in class com.google.cloud.hadoop.io.bigquery.BigQueryFactory
-
Construct credentials from the passed Configuration.
- createCommitter(TaskAttemptContext) - Method in class com.google.cloud.hadoop.io.bigquery.output.FederatedBigQueryOutputFormat
-
- createCommitter(TaskAttemptContext) - Method in class com.google.cloud.hadoop.io.bigquery.output.ForwardingBigQueryFileOutputFormat
-
Create a new OutputCommitter for this OutputFormat.
- createCommitter(TaskAttemptContext) - Method in class com.google.cloud.hadoop.io.bigquery.output.IndirectBigQueryOutputFormat
-
- createDelegateRecordReader(InputSplit, Configuration) - Method in class com.google.cloud.hadoop.io.bigquery.AvroBigQueryInputFormat
-
- createDelegateRecordReader(InputSplit, Configuration) - Method in interface com.google.cloud.hadoop.io.bigquery.DelegateRecordReaderFactory
-
Create a new record reader for a single input split.
- createDelegateRecordReader(InputSplit, Configuration) - Method in class com.google.cloud.hadoop.io.bigquery.GsonBigQueryInputFormat
-
- createDelegateRecordReader(InputSplit, Configuration) - Method in class com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat
-
- createJobReference(String, String, String) - Method in class com.google.cloud.hadoop.io.bigquery.BigQueryHelper
-
Creates a new JobReference with a unique jobId generated from jobIdPrefix
plus a
randomly generated UUID String.
- createRecordReader(InputSplit, TaskAttemptContext) - Method in class com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat
-
- createRecordReader(InputSplit, Configuration) - Method in class com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat
-
- createRecordReader(InputSplit, TaskAttemptContext) - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat
-
- GCS_BUCKET - Static variable in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Configuration key for the GCS bucket holding TEMP_GCS_PATH
- gcsPath - Variable in class com.google.cloud.hadoop.io.bigquery.AbstractExportToCloudStorage
-
- gcsPaths - Variable in class com.google.cloud.hadoop.io.bigquery.NoopFederatedExportToCloudStorage
-
- getAsJson() - Method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryTimePartitioning
-
- getBigQuery(Configuration) - Method in class com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat
-
Helper method to override for testing.
- getBigQuery(Configuration) - Method in class com.google.cloud.hadoop.io.bigquery.BigQueryFactory
-
Constructs a BigQuery from the credential constructed from the environment.
- getBigQueryFromCredential(Configuration, Credential, String) - Method in class com.google.cloud.hadoop.io.bigquery.BigQueryFactory
-
Constructs a BigQuery from a given Credential.
- getBigQueryHelper(Configuration) - Method in class com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat
-
Helper method to override for testing.
- getBigQueryHelper(Configuration) - Method in class com.google.cloud.hadoop.io.bigquery.BigQueryFactory
-
- getBigQueryHelper(Configuration) - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat
-
Helper method to override for testing.
- getBigQueryHelper() - Method in class com.google.cloud.hadoop.io.bigquery.output.ForwardingBigQueryFileOutputCommitter
-
Gets the helper used to interact with BigQuery.
- getCleanupTemporaryDataFlag(Configuration) - Static method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryOutputConfiguration
-
Gets if the configuration flag to cleanup temporary data in GCS is enabled or not.
- getClient(Configuration) - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat
-
Helper method to override for testing.
- getClient(Configuration) - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryRecordReader
-
Helper method to override for testing.
- getCreateDisposition(Configuration) - Static method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryOutputConfiguration
-
Gets the create disposition of the output table.
- getCurrentKey() - Method in class com.google.cloud.hadoop.io.bigquery.AvroRecordReader
-
- getCurrentKey() - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryRecordReader
-
- getCurrentKey() - Method in class com.google.cloud.hadoop.io.bigquery.DynamicFileListRecordReader
-
Gets the current key as reported by the delegate record reader.
- getCurrentKey() - Method in class com.google.cloud.hadoop.io.bigquery.GsonRecordReader
-
Gets the current key.
- getCurrentValue() - Method in class com.google.cloud.hadoop.io.bigquery.AvroRecordReader
-
- getCurrentValue() - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryRecordReader
-
- getCurrentValue() - Method in class com.google.cloud.hadoop.io.bigquery.DynamicFileListRecordReader
-
Gets the current value.
- getCurrentValue() - Method in class com.google.cloud.hadoop.io.bigquery.GsonRecordReader
-
Gets the current value.
- getDelegate() - Method in class com.google.cloud.hadoop.io.bigquery.output.ForwardingBigQueryFileOutputCommitter
-
Gets the delegate OutputCommitter being wrapped.
- getDelegate(Configuration) - Method in class com.google.cloud.hadoop.io.bigquery.output.ForwardingBigQueryFileOutputFormat
-
Gets a reference to the underlying delegate used by this OutputFormat.
- getExpirationMs() - Method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryTimePartitioning
-
- getExportFileFormat() - Method in class com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat
-
Get the ExportFileFormat that this input format supports.
- getExportFileFormat(Configuration) - Static method in class com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat
-
- getExportFileFormat(Class<? extends AbstractBigQueryInputFormat<?, ?>>) - Static method in class com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat
-
- getExportFileFormat() - Method in class com.google.cloud.hadoop.io.bigquery.AvroBigQueryInputFormat
-
- getExportFileFormat() - Method in class com.google.cloud.hadoop.io.bigquery.GsonBigQueryInputFormat
-
- getExportFileFormat() - Method in class com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat
-
- getExportPaths() - Method in interface com.google.cloud.hadoop.io.bigquery.Export
-
Get a list of export paths to provide to BigQuery
- getExportPaths() - Method in class com.google.cloud.hadoop.io.bigquery.NoopFederatedExportToCloudStorage
-
- getExportPaths() - Method in class com.google.cloud.hadoop.io.bigquery.UnshardedExportToCloudStorage
-
- getExtension() - Method in enum com.google.cloud.hadoop.io.bigquery.BigQueryFileFormat
-
Get the default extension to denote the file format.
- getField() - Method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryTimePartitioning
-
- getFields() - Method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryTableFieldSchema
-
Gets the nested schema fields if the type property is set to RECORD.
- getFileFormat(Configuration) - Static method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryOutputConfiguration
-
- getFileOutputFormat(Configuration) - Static method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryOutputConfiguration
-
Gets a configured instance of the stored FileOutputFormat
in the configuration.
- getFilePattern() - Method in enum com.google.cloud.hadoop.io.bigquery.ExportFileFormat
-
Get the file pattern to use when exporting
- getFormatIdentifier() - Method in enum com.google.cloud.hadoop.io.bigquery.BigQueryFileFormat
-
Get the identifier to specify in API requests.
- getFormatIdentifier() - Method in enum com.google.cloud.hadoop.io.bigquery.ExportFileFormat
-
Get the identifier to specify in API requests
- getGcsOutputPath(Configuration) - Static method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryOutputConfiguration
-
Gets the stored GCS output path in the configuration.
- getJobProjectId(Configuration) - Static method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryOutputConfiguration
-
Gets the project id to be used to run BQ load job based on the given configuration.
- getKmsKeyName(Configuration) - Static method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryOutputConfiguration
-
Gets the output table KMS key name based on the given configuration.
- getLength() - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat.DirectBigQueryInputSplit
-
- getLength() - Method in class com.google.cloud.hadoop.io.bigquery.ShardedInputSplit
-
Estimated number of records to read, *not* the number of bytes.
- getLimit() - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat.DirectBigQueryInputSplit
-
- getLocations() - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat.DirectBigQueryInputSplit
-
- getLocations() - Method in class com.google.cloud.hadoop.io.bigquery.ShardedInputSplit
-
- getMode() - Method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryTableFieldSchema
-
- getName() - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat.DirectBigQueryInputSplit
-
- getName() - Method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryTableFieldSchema
-
- getOutputCommitter(TaskAttemptContext) - Method in class com.google.cloud.hadoop.io.bigquery.output.ForwardingBigQueryFileOutputFormat
-
Gets the cached OutputCommitter, creating a new one if it doesn't exist.
- getOutputFileURIs() - Method in class com.google.cloud.hadoop.io.bigquery.output.ForwardingBigQueryFileOutputCommitter
-
Queries the file system for the URIs of all files in the base output directory that are not
directories and whose name isn't FileOutputCommitter.SUCCEEDED_FILE_NAME
.
- getProgress() - Method in class com.google.cloud.hadoop.io.bigquery.AvroRecordReader
-
- getProgress() - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryRecordReader
-
- getProgress() - Method in class com.google.cloud.hadoop.io.bigquery.DynamicFileListRecordReader
-
Returns the current progress based on the number of records read compared to the *estimated*
total number of records planned to be read; this number may be inexact, but will not
report a number greater than 1.
- getProgress() - Method in class com.google.cloud.hadoop.io.bigquery.GsonRecordReader
-
Returns the current progress of the record reader through its data.
- getProjectId(Configuration) - Static method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryOutputConfiguration
-
Gets the output dataset project id based on the given configuration.
- getRawBigquery() - Method in class com.google.cloud.hadoop.io.bigquery.BigQueryHelper
-
Returns the underlying Bigquery instance used for communicating with the BigQuery API.
- getRecordWriter(TaskAttemptContext) - Method in class com.google.cloud.hadoop.io.bigquery.output.ForwardingBigQueryFileOutputFormat
-
Gets the RecordWriter from the wrapped FileOutputFormat.
- getRequirePartitionFilter() - Method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryTimePartitioning
-
- getSchema() - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat.DirectBigQueryInputSplit
-
- getSchemaFromString(String) - Static method in class com.google.cloud.hadoop.io.bigquery.BigQueryUtils
-
Parses the given JSON string and returns the extracted schema.
- getShardDirectoryAndPattern() - Method in class com.google.cloud.hadoop.io.bigquery.ShardedInputSplit
-
Accessor for shardDirectoryAndPattern.
- getSplits(JobContext) - Method in class com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat
-
- getSplits(JobContext) - Method in class com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat
-
- getSplits(JobContext) - Method in interface com.google.cloud.hadoop.io.bigquery.Export
-
Get input splits that should be passed to Hadoop.
- getSplits(JobContext) - Method in class com.google.cloud.hadoop.io.bigquery.UnshardedExportToCloudStorage
-
- getTable(TableReference) - Method in class com.google.cloud.hadoop.io.bigquery.BigQueryHelper
-
Gets the specified table resource by table ID.
- getTemporaryPathRoot(Configuration, JobID) - Static method in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
- getType() - Method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryTableFieldSchema
-
- getType() - Method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryTimePartitioning
-
- getWriteDisposition(Configuration) - Static method in class com.google.cloud.hadoop.io.bigquery.output.BigQueryOutputConfiguration
-
Gets the write disposition of the output table.
- GsonBigQueryInputFormat - Class in com.google.cloud.hadoop.io.bigquery
-
GsonBigQueryInputFormat provides access to BigQuery tables via exports to GCS in the form of
gson JsonObjects as mapper values.
- GsonBigQueryInputFormat() - Constructor for class com.google.cloud.hadoop.io.bigquery.GsonBigQueryInputFormat
-
- GsonRecordReader - Class in com.google.cloud.hadoop.io.bigquery
-
The GsonRecordReader reads records from GCS through GHFS.
- GsonRecordReader() - Constructor for class com.google.cloud.hadoop.io.bigquery.GsonRecordReader
-
- OUTPUT_CLEANUP_TEMP - Static variable in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Configuration key indicating whether temporary data stored in GCS should be deleted after the
output job is complete.
- OUTPUT_DATASET_ID - Static variable in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Configuration key for numeric ID of the output dataset accessed by the output format.
- OUTPUT_FILE_FORMAT - Static variable in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Configuration key for the file format of the files outputted by the wrapped FileOutputFormat.
- OUTPUT_FORMAT_CLASS - Static variable in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Configuration key for the FileOutputFormat class that's going to be wrapped by the output
format.
- OUTPUT_PROJECT_ID - Static variable in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Configuration key for the output project ID of the dataset accessed by the output format.
- OUTPUT_TABLE_CREATE_DISPOSITION - Static variable in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Configuration key for the create disposition of the output table.
- OUTPUT_TABLE_ID - Static variable in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Configuration key for numeric ID of the output table written by the output format.
- OUTPUT_TABLE_KMS_KEY_NAME - Static variable in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Configuration key for the Cloud KMS encryption key that will be used to protect output BigQuery
table.
- OUTPUT_TABLE_PARTITIONING - Static variable in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Configuration key for the output table partitioning used by the output format.
- OUTPUT_TABLE_SCHEMA - Static variable in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Configuration key for the output table schema used by the output format.
- OUTPUT_TABLE_WRITE_DISPOSITION - Static variable in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Configuration key for the write disposition of the output table.
- OUTPUT_WRITE_BUFFER_SIZE - Static variable in class com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration
-
Size of the output buffer, in bytes, to use for BigQuery output.