Add artifact to stage in Dataflow - artifact can be jar/text-files etc.
Add artifact to stage in Dataflow - artifact can be jar/text-files etc. NOTE: currently one can add artifacts only before pipeline object is created
Get an SCollection for an Avro file.
Get an SCollection for a BigQuery SELECT query.
Get an SCollection for a BigQuery table.
Get an SCollection for a BigQuery table.
Close the context.
Close the context. No operation can be performed once the context is closed.
Get an SCollection for a Datastore query.
Create a new DistCache instance.
Create a new DistCache instance.
Google Cloud Storage URIs of the files to be distributed to all workers
function to initialized the distributed files
Create a new DistCache instance.
Create a new DistCache instance.
Google Cloud Storage URI of the file to be distributed to all workers
function to initialized the distributed file
// Prepare distributed cache as Map[Int, String] val dc = sc.distCache("gs://dataflow-samples/samples/misc/months.txt") { f => scala.io.Source.fromFile(f).getLines().map { s => val t = s.split(" ") (t(0).toInt, t(1)) }.toMap } val p: SCollection[Int] = // ... // Extract distributed cache inside a transform p.map(x => dc().getOrElse(x, "unknown"))
Whether the context is closed.
Create a new Accumulator that keeps track of the maximum value.
Create a new Accumulator that keeps track of the maximum value. See SCollection.withAccumulator for examples.
Create a new Accumulator that keeps track of the minimum value.
Create a new Accumulator that keeps track of the minimum value. See SCollection.withAccumulator for examples.
Get an SCollection for an object file.
Distribute a local Scala Map to form an SCollection.
Distribute a local Scala Iterable to form an SCollection.
Distribute a local Scala Iterable with timestamps to form an SCollection.
Distribute a local Scala Iterable with timestamps to form an SCollection.
Dataflow pipeline.
Get an SCollection for a Pub/Sub subscription.
Get an SCollection for a Pub/Sub topic.
Set name for the context.
Create a new Accumulator that keeps track of the sum of values.
Create a new Accumulator that keeps track of the sum of values. See SCollection.withAccumulator for examples.
Get an SCollection of TableRow for a JSON file.
Get an SCollection for a text file.
Wrap a PCollection.
Main entry point for Dataflow functionality. A ScioContext represents a Dataflow pipeline, and can be used to create SCollections and distributed caches on that cluster.