Creates a akka.stream.scaladsl.Source that reads Parquet data from the specified path.
Creates a akka.stream.scaladsl.Source that reads Parquet data from the specified path.
If there are multiple files at path then the order in which files are loaded is determined by underlying
filesystem.
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc.
Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
Can read also partitioned directories. Filter applies also to partition values. Partition values are set
as fields in read entities at path defined by partition name. Path can be a simple column name or a dot-separated
path to nested field. Missing intermediate fields are automatically created for each read record.
Take note! that due to an issue with implicit resolution in Scala 2.11 you may need to define
all parameters of ParquetStreams.fromParquet
even if some have default values. It specifically refers to
a case when you would like to omit options
but define filter
. Such situation doesn't appear in Scala 2.12 and
2.13.
type of data that represent the schema of the Parquet data, e.g.:
case class MyData(id: Long, name: String, created: java.sql.Timestamp)
URI to Parquet files, e.g.:
"file:///data/users"
configuration of how Parquet files should be read
optional before-read filter; no filtering is applied by default; check Filter for more details
The source of Parquet data
Creates a akka.stream.scaladsl.Sink that writes Parquet data to single file at the specified path (including file name).
Creates a akka.stream.scaladsl.Sink that writes Parquet data to single file at the specified path (including
file name).
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc.
Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
type of data that represent the schema of the Parquet data, e.g.:
case class MyData(id: Long, name: String, created: java.sql.Timestamp)
URI to Parquet files, e.g.:
"file:///data/users/users-2019-01-01.parquet"
set of options that define how Parquet files will be created
The sink that writes Parquet file
Builds a flow that:
Builds a flow that:
type of message that flow is meant to accept
URI to Parquet files, e.g.:
"file:///data/users"
Builder of ParquetPartitioningFlow
Creates a akka.stream.scaladsl.Sink that writes Parquet data to files at the specified path.
Creates a akka.stream.scaladsl.Sink that writes Parquet data to files at the specified path. Sink splits files when maxChunkSize is reached or time equal to chunkWriteTimeWindow elapses. Files are named and written to path according to buildChunkPath. By default path looks like
PATH/part-RANDOM_UUID.parquet. Objects coming into sink can be optionally transformed using preWriteTransformation and later handled by means of postWriteSink after transformed object is saved to file.
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc.
Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
PATH/part-RANDOM_UUID.parquet
Objects coming into sink can be optionally transformed using preWriteTransformation and later handled by means
of postWriteSink after transformed object is saved to file.
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc.
Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
type of incoming objects
type of data that represent the schema of the Parquet data, e.g.:
case class MyData(id: Long, name: String, created: java.sql.Timestamp)
type of sink's materalized value
URI to Parquet files, e.g.:
"file:///data/users"
maximum number of records that can be saved to single parquet file
maximum time that sink will wait before saving (non-empty) file
factory function to define custom path for each saved file
function that transforms incoming object into data that are written to the file
allows to to define action to be taken after each incoming object is successfully written to the file
set of options that define how Parquet files will be created
The sink that writes Parquet files
(Since version 1.3.0) Use viaParquet instead
Creates a akka.stream.scaladsl.Sink that writes Parquet data to files at the specified path.
Creates a akka.stream.scaladsl.Sink that writes Parquet data to files at the specified path. Sink splits files into number of pieces equal to parallelism. Files are written in parallel. Data is written in unordered way.
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc.
Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
type of data that represent the schema of the Parquet data, e.g.:
case class MyData(id: Long, name: String, created: java.sql.Timestamp)
URI to Parquet files, e.g.:
"file:///data/users"
defines how many files are created and how many parallel threads are responsible for it
set of options that define how Parquet files will be created
The sink that writes Parquet files
(Since version 1.4.0) In the future only viaParquet and toParquetSingleFile may be only supported writers
Creates a akka.stream.scaladsl.Sink that writes Parquet data to files at the specified path.
Creates a akka.stream.scaladsl.Sink that writes Parquet data to files at the specified path. Sink splits files sequentially into pieces. Each file contains maximal number of records according to maxRecordsPerFile. It is recommended to define maxRecordsPerFile as a multiple of com.github.mjakubowski84.parquet4s.ParquetWriter.Options.rowGroupSize.
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc.
Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
type of data that represent the schema of the Parquet data, e.g.:
case class MyData(id: Long, name: String, created: java.sql.Timestamp)
URI to Parquet files, e.g.:
"file:///data/users"
the maximum size of file
set of options that define how Parquet files will be created
The sink that writes Parquet files
(Since version 1.4.0) In the future only viaParquet and toParquetSingleFile may be only supported writers
Holds factory of Akka Streams sources and sinks that allow reading from and writing to Parquet files.