ParquetStreams

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def fromParquet[T](path: String, options: Options = ParquetReader.Options())(implicit arg0: ParquetRecordDecoder[T]): Source[T, NotUsed]

Creates a akka.stream.scaladsl.Source that reads Parquet data from the specified path.
Creates a akka.stream.scaladsl.Source that reads Parquet data from the specified path. If there are multiple files at path then the order in which files are loaded is determined by underlying filesystem.
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc. Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
T
type of data that represent the schema of the Parquet data, e.g.:
```
case class MyData(id: Long, name: String, created: java.sql.Timestamp)
```
path
URI to Parquet files, e.g.:
```
"file:///data/users"
```
returns
The source of Parquet data
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toParquetParallelUnordered[T](path: String, parallelism: Int, options: Options = ParquetWriter.Options())(implicit arg0: ParquetRecordEncoder[T], arg1: ParquetSchemaResolver[T]): Sink[T, Future[Done]]

Creates a akka.stream.scaladsl.Sink that writes Parquet data to files at the specified path.
Creates a akka.stream.scaladsl.Sink that writes Parquet data to files at the specified path. Sink splits files into number of pieces equal to parallelism. Files are written in parallel. Data is written in unordered way.

Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc. Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
T
type of data that represent the schema of the Parquet data, e.g.:
```
case class MyData(id: Long, name: String, created: java.sql.Timestamp)
```
path
URI to Parquet files, e.g.:
```
"file:///data/users"
```
parallelism
defines how many files are created and how many parallel threads are responsible for it
options
set of options that define how Parquet files will be created
returns
The sink that writes Parquet files
def toParquetSequentialWithFileSplit[T](path: String, maxRecordsPerFile: Long, options: Options = ParquetWriter.Options())(implicit arg0: ParquetRecordEncoder[T], arg1: ParquetSchemaResolver[T]): Sink[T, Future[Done]]

Creates a akka.stream.scaladsl.Sink that writes Parquet data to files at the specified path.
Creates a akka.stream.scaladsl.Sink that writes Parquet data to files at the specified path. Sink splits files sequentially into pieces. Each file contains maximal number of records according to maxRecordsPerFile. It is recommended to define maxRecordsPerFile as a multiple of com.github.mjakubowski84.parquet4s.ParquetWriter.Options.rowGroupSize.

Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc. Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
T
type of data that represent the schema of the Parquet data, e.g.:
```
case class MyData(id: Long, name: String, created: java.sql.Timestamp)
```
path
URI to Parquet files, e.g.:
```
"file:///data/users"
```
maxRecordsPerFile
the maximum size of file
options
set of options that define how Parquet files will be created
returns
The sink that writes Parquet files
def toParquetSingleFile[T](path: String, options: Options = ParquetWriter.Options())(implicit arg0: ParquetRecordEncoder[T], arg1: ParquetSchemaResolver[T]): Sink[T, Future[Done]]

Creates a akka.stream.scaladsl.Sink that writes Parquet data to single file at the specified path (including file name).
Creates a akka.stream.scaladsl.Sink that writes Parquet data to single file at the specified path (including file name).
Path can refer to local file, HDFS, AWS S3, Google Storage, Azure, etc. Please refer to Hadoop client documentation or your data provider in order to know how to configure the connection.
T
type of data that represent the schema of the Parquet data, e.g.:
```
case class MyData(id: Long, name: String, created: java.sql.Timestamp)
```
path
URI to Parquet files, e.g.:
```
"file:///data/users/users-2019-01-01.parquet"
```
options
set of options that define how Parquet files will be created
returns
The sink that writes Parquet file
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package parquet4s

object ParquetStreams

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

def fromParquet[T](path: String, options: Options = ParquetReader.Options())(implicit arg0: ParquetRecordDecoder[T]): Source[T, NotUsed]

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

def toParquetParallelUnordered[T](path: String, parallelism: Int, options: Options = ParquetWriter.Options())(implicit arg0: ParquetRecordEncoder[T], arg1: ParquetSchemaResolver[T]): Sink[T, Future[Done]]

def toParquetSequentialWithFileSplit[T](path: String, maxRecordsPerFile: Long, options: Options = ParquetWriter.Options())(implicit arg0: ParquetRecordEncoder[T], arg1: ParquetSchemaResolver[T]): Sink[T, Future[Done]]

def toParquetSingleFile[T](path: String, options: Options = ParquetWriter.Options())(implicit arg0: ParquetRecordEncoder[T], arg1: ParquetSchemaResolver[T]): Sink[T, Future[Done]]

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from AnyRef

Inherited from Any

Ungrouped