Get an SCollection of type SpecificRecord for an Avro file.
Get an SCollection for an object file using default serialization.
Get an SCollection for an object file using default serialization.
Serialized objects are stored in Avro files to leverage Avro's block file format. Note that serialization is not guaranteed to be compatible across Scio releases.
Get an SCollection of type T for data stored in Avro format after applying parseFn to map a serialized GenericRecord to type T.
Get an SCollection of type T for data stored in Avro format after applying parseFn to map a serialized GenericRecord to type T.
This API should be used with caution as the parseFn
reads from a GenericRecord
and hence
is not type checked.
This is intended to be used when attempting to read GenericRecord
s without specifying a
schema (hence the writer schema is used to deserialize) and then directly converting
to a type T using a parseFn
. This avoids creation of an intermediate
SCollection[GenericRecord]
which can be in efficient because Coder[GenericRecord]
is
inefficient without a known Avro schema.
Example usage: This code reads Avro fields "id" and "name" and de-serializes only those two into CaseClass
val sColl: SCollection[CaseClass] = sc.parseAvroFile("gs://.....") { g => CaseClass(g.get("id").asInstanceOf[Int], g.get("name").asInstanceOf[String]) }
Get an SCollection for a Protobuf file.
Get an SCollection for a Protobuf file.
Protobuf messages are serialized into Array[Byte]
and stored in Avro files to leverage
Avro's block file format.
Get a typed SCollection from an Avro schema.
Get a typed SCollection from an Avro schema.
Note that T
must be annotated with
AvroType.fromSchema,
AvroType.fromPath, or
AvroType.toSchema.
Enhanced version of ScioContext with Avro methods.