Class/Object

com.ebiznext.comet.job.convert

Parquet2CSV

Related Docs: object Parquet2CSV | package convert

Permalink

class Parquet2CSV extends SparkJob

Convert parquet files to CSV. The folder hierarchy should be in the form /input_folder/domain/schema/part*.parquet Once converted the csv files is put in the folder /output_folder/domain/schema.csv file When the specified number of parittions is 1 then /output_folder/domain/schema.csv is the file containing the data otherwise, it is a folder containng the part*.csv files. When output_folder is not specified, then the input_folder is used a the base output folder.

Linear Supertypes
SparkJob, StrictLogging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Parquet2CSV
  2. SparkJob
  3. StrictLogging
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Parquet2CSV(config: Parquet2CSVConfig, storageHandler: StorageHandler)(implicit settings: Settings)

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  11. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  12. val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    StrictLogging
  13. def name: String

    Permalink
    Definition Classes
    Parquet2CSVSparkJob
  14. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  15. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  16. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  17. def partitionDataset(dataset: DataFrame, partition: List[String]): DataFrame

    Permalink
    Definition Classes
    SparkJob
  18. def partitionedDatasetWriter(dataset: DataFrame, partition: List[String]): DataFrameWriter[Row]

    Permalink

    Partition a dataset using dataset columns.

    Partition a dataset using dataset columns. To partition the dataset using the igestion time, use the reserved column names :

    • comet_year
    • comet_month
    • comet_day
    • comet_hour
    • comet_minute These columsn are renamed to "year", "month", "day", "hour", "minute" in the dataset and their values is set to the current date/time.
    dataset

    : Input dataset

    partition

    : list of columns to use for partitioning.

    returns

    The Spark session used to run this job

    Definition Classes
    SparkJob
  19. def run(): Try[SparkSession]

    Permalink

    Just to force any spark job to implement its entry point using within the "run" method

    Just to force any spark job to implement its entry point using within the "run" method

    returns

    : Spark Session used for the job

    Definition Classes
    Parquet2CSVSparkJob
  20. lazy val session: SparkSession

    Permalink
    Definition Classes
    SparkJob
  21. implicit val settings: Settings

    Permalink
    Definition Classes
    Parquet2CSVSparkJob
  22. lazy val sparkEnv: SparkEnv

    Permalink
    Definition Classes
    SparkJob
  23. val storageHandler: StorageHandler

    Permalink
  24. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  25. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  26. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from SparkJob

Inherited from StrictLogging

Inherited from AnyRef

Inherited from Any

Ungrouped