TypedBuilder

com.github.mjakubowski84.parquet4s.parquet.rotatingWriter$.TypedBuilder
trait TypedBuilder[F[_], T, W] extends Builder[F, T, W, TypedBuilder[F, T, W]]

Attributes

Graph
Supertypes
trait Builder[F, T, W, TypedBuilder[F, T, W]]
class Object
trait Matchable
class Any

Members list

Concise view

Value members

Abstract methods

def preWriteTransformation[X](transformation: T => Stream[F, X]): TypedBuilder[F, T, X]

Attributes

X

Schema type

transformation

function that is called by stream in order to transform data to final write format. Identity by default.

def write(basePath: Path)(implicit schemaResolver: ParquetSchemaResolver[W], encoder: ParquetRecordEncoder[W]): (F, T) => T

Builds final writer pipe.

Builds final writer pipe.

Attributes

Inherited methods

def chunkSize(chunkSize: Int): Self

For sake of better performance writer processes data in chunks rather than one by one. Default value is 16.

For sake of better performance writer processes data in chunks rather than one by one. Default value is 16.

Attributes

chunkSize

default value override

Inherited from:
Builder
def maxCount(maxCount: Long): Self

Attributes

maxCount

max number of records to be written before file rotation

Inherited from:
Builder
def maxDuration(maxDuration: FiniteDuration): Self

Attributes

maxDuration

max time after which partition file is rotated

Inherited from:
Builder
def options(writeOptions: Options): Self

Attributes

writeOptions

writer options used by the flow

Inherited from:
Builder
def partitionBy(partitionBy: ColumnPath*): Self

Sets partition paths that stream partitions data by. Can be empty. Partition path can be a simple string column (e.g. "color") or a path pointing nested string field (e.g. "user.address.postcode"). Partition path is used to extract data from the entity and to create a tree of subdirectories for partitioned files. Using aforementioned partitions effects in creation of (example) following tree:

Sets partition paths that stream partitions data by. Can be empty. Partition path can be a simple string column (e.g. "color") or a path pointing nested string field (e.g. "user.address.postcode"). Partition path is used to extract data from the entity and to create a tree of subdirectories for partitioned files. Using aforementioned partitions effects in creation of (example) following tree:

../color=blue
     /user.address.postcode=XY1234/
     /user.address.postcode=AB4321/
 /color=green
     /user.address.postcode=XY1234/
     /user.address.postcode=CV3344/
     /user.address.postcode=GH6732/

Take note:

  • PartitionBy must point a string field.

  • Partitioning removes partition fields from the schema. Data is stored in the name of subdirectory instead of Parquet file.

  • Partitioning cannot end in having empty schema. If you remove all fields of the message you will get an error.

  • Partitioned directories can be filtered effectively during reading.

Attributes

partitionBy

ColumnPaths to partition by

Inherited from:
Builder
def postWriteHandler(postWriteHandler: F => T): Self

Adds a handler that is invoked after write of each chunk of records. Handler exposes some of the internal state of the flow. Intended for lower level monitoring and control.

Adds a handler that is invoked after write of each chunk of records. Handler exposes some of the internal state of the flow. Intended for lower level monitoring and control.


If you wish to have postWriteHandler invoked after write of each single element than change the size of chunk by changing a value of chunkSize property.

Attributes

postWriteHandler

an effect called after writing a chunk of records, receiving a snapshot of the internal state of the flow as a parameter.

Inherited from:
Builder