Add artifact to stage in workers.
Add artifact to stage in workers. Artifact can be jar/text-files etc. NOTE: currently one can only add artifacts before pipeline object is created.
Get an SCollection for an Avro file.
Get an SCollection for an Avro file.
must be not null if T
is of type
GenericRecord.
Get an SCollection for a BigQuery SELECT query.
Get an SCollection for a BigQuery table.
Get an SCollection for a BigQuery table.
Close the context.
Close the context. No operation can be performed once the context is closed.
Get an SCollection with a custom input transform.
Get an SCollection with a custom input transform. The transform should have a unique name.
Get an SCollection for a Datastore query.
Whether the context is closed.
Whether this is a test context.
Create a new Accumulator that keeps track of the maximum value.
Create a new Accumulator that keeps track of the maximum value. See SCollection.withAccumulator for examples.
Create a new Accumulator that keeps track of the minimum value.
Create a new Accumulator that keeps track of the minimum value. See SCollection.withAccumulator for examples.
Get an SCollection for an object file using default serialization.
Get an SCollection for an object file using default serialization.
Serialized objects are stored in Avro files to leverage Avro's block file format. Note that serialization is not guaranteed to be compatible across Scio releases.
Get PipelineOptions as a more specific sub-type.
Distribute a local Scala Map
to form an SCollection.
Distribute a local Scala Iterable
to form an SCollection.
Distribute a local Scala Iterable
with timestamps to form an SCollection.
Distribute a local Scala Iterable
with timestamps to form an SCollection.
Underlying pipeline.
Get an SCollection for a Protobuf file.
Get an SCollection for a Protobuf file.
Protobuf messages are serialized into Array[Byte]
and stored in Avro files to leverage
Avro's block file format.
Get an SCollection for a Pub/Sub subscription.
Get an SCollection for a Pub/Sub topic.
Set application name for the context.
Set job name for the context (Dataflow only)
Create a new Accumulator that keeps track of the sum of values.
Create a new Accumulator that keeps track of the sum of values. See SCollection.withAccumulator for examples.
Get an SCollection for a BigQuery TableRow JSON file.
Get an SCollection for a text file.
Get an SCollection for a TensorFlow TFRecord file.
Get an SCollection for a TensorFlow TFRecord file. Note that TFRecord files are not splittable.
Get a typed SCollection for a BigQuery SELECT query or table.
Get a typed SCollection for a BigQuery SELECT query or table.
Note that T
must be annotated with
BigQueryType.fromSchema,
BigQueryType.fromTable,
BigQueryType.fromQuery, or
BigQueryType.toTable.
By default the source (table or query) specified in the annotation will be used, but it can
be overridden with the newSource
parameter. For example:
@BigQueryType.fromTable("publicdata:samples.gsod") class Row // Read from [publicdata:samples.gsod] as specified in the annotation. sc.typedBigQuery[Row]() // Read from [myproject:samples.gsod] instead. sc.typedBigQuery[Row]("myproject:samples.gsod") // Read from a query instead. sc.typedBigQuery[Row]("SELECT * FROM [publicdata:samples.gsod] LIMIT 1000")
Set a custom name for the next transform to be applied.
Set a custom name for the next transform to be applied.
Wrap a PCollection.
Main entry point for Scio functionality. A ScioContext represents a pipeline and can be used to create SCollections and distributed caches on that cluster.