Class

io.smartdatalake.workflow.dataobject

ExcelOptions

Related Doc: package dataobject

Permalink

case class ExcelOptions(sheetName: String, numLinesToSkip: Option[Int] = None, startColumn: Option[Int] = None, endColumn: Option[Int] = None, rowLimit: Option[Int] = None, useHeader: Boolean = true, treatEmptyValuesAsNulls: Option[Boolean] = Some(true), inferSchema: Option[Boolean] = Some(true), timestampFormat: Option[String] = Some("dd-MM-yyyy HH:mm:ss"), dateFormat: Option[String] = None, maxRowsInMemory: Option[Int] = None, excerptSize: Option[Int] = None) extends Product with Serializable

Options passed to org.apache.spark.sql.DataFrameReader and org.apache.spark.sql.DataFrameWriter for reading and writing Microsoft Excel files. Excel support is provided by the spark-excel project (see link below).

sheetName

the name of the Excel Sheet to read from/write to. This option is required.

numLinesToSkip

the number of rows in the excel spreadsheet to skip before any data is read. This option must not be set for writing.

startColumn

the first column in the specified Excel Sheet to read from (1-based indexing). This option must not be set for writing.

endColumn

TODO: this is not used anymore as far as I can tell --> crealytics now uses dataAddress.

rowLimit

Limit the number of rows being returned on read to the first rowLimit rows. This is applied after numLinesToSkip.

useHeader

If true, the first row of the excel sheet specifies the column names. This option is required (default: true).

treatEmptyValuesAsNulls

Empty cells are parsed as null values (default: true).

inferSchema

Infer the schema of the excel sheet automatically (default: true).

timestampFormat

A format string specifying the format to use when writing timestamps (default: dd-MM-yyyy HH:mm:ss).

dateFormat

A format string specifying the format to use when writing dates.

maxRowsInMemory

The number of rows that are stored in memory. If set, a streaming reader is used which can help with big files.

excerptSize

Sample size for schema inference.

See also

https://github.com/crealytics/spark-excel

Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ExcelOptions
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ExcelOptions(sheetName: String, numLinesToSkip: Option[Int] = None, startColumn: Option[Int] = None, endColumn: Option[Int] = None, rowLimit: Option[Int] = None, useHeader: Boolean = true, treatEmptyValuesAsNulls: Option[Boolean] = Some(true), inferSchema: Option[Boolean] = Some(true), timestampFormat: Option[String] = Some("dd-MM-yyyy HH:mm:ss"), dateFormat: Option[String] = None, maxRowsInMemory: Option[Int] = None, excerptSize: Option[Int] = None)

    Permalink

    sheetName

    the name of the Excel Sheet to read from/write to. This option is required.

    numLinesToSkip

    the number of rows in the excel spreadsheet to skip before any data is read. This option must not be set for writing.

    startColumn

    the first column in the specified Excel Sheet to read from (1-based indexing). This option must not be set for writing.

    endColumn

    TODO: this is not used anymore as far as I can tell --> crealytics now uses dataAddress.

    rowLimit

    Limit the number of rows being returned on read to the first rowLimit rows. This is applied after numLinesToSkip.

    useHeader

    If true, the first row of the excel sheet specifies the column names. This option is required (default: true).

    treatEmptyValuesAsNulls

    Empty cells are parsed as null values (default: true).

    inferSchema

    Infer the schema of the excel sheet automatically (default: true).

    timestampFormat

    A format string specifying the format to use when writing timestamps (default: dd-MM-yyyy HH:mm:ss).

    dateFormat

    A format string specifying the format to use when writing dates.

    maxRowsInMemory

    The number of rows that are stored in memory. If set, a streaming reader is used which can help with big files.

    excerptSize

    Sample size for schema inference.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. val dateFormat: Option[String]

    Permalink

    A format string specifying the format to use when writing dates.

  7. val endColumn: Option[Int]

    Permalink

    TODO: this is not used anymore as far as I can tell --> crealytics now uses dataAddress.

  8. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  9. val excerptSize: Option[Int]

    Permalink

    Sample size for schema inference.

  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  12. val inferSchema: Option[Boolean]

    Permalink

    Infer the schema of the excel sheet automatically (default: true).

  13. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  14. val maxRowsInMemory: Option[Int]

    Permalink

    The number of rows that are stored in memory.

    The number of rows that are stored in memory. If set, a streaming reader is used which can help with big files.

  15. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  16. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  17. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  18. val numLinesToSkip: Option[Int]

    Permalink

    the number of rows in the excel spreadsheet to skip before any data is read.

    the number of rows in the excel spreadsheet to skip before any data is read. This option must not be set for writing.

  19. val rowLimit: Option[Int]

    Permalink

    Limit the number of rows being returned on read to the first rowLimit rows.

    Limit the number of rows being returned on read to the first rowLimit rows. This is applied after numLinesToSkip.

  20. val sheetName: String

    Permalink

    the name of the Excel Sheet to read from/write to.

    the name of the Excel Sheet to read from/write to. This option is required.

  21. val startColumn: Option[Int]

    Permalink

    the first column in the specified Excel Sheet to read from (1-based indexing).

    the first column in the specified Excel Sheet to read from (1-based indexing). This option must not be set for writing.

  22. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  23. val timestampFormat: Option[String]

    Permalink

    A format string specifying the format to use when writing timestamps (default: dd-MM-yyyy HH:mm:ss).

  24. def toMap(schema: Option[StructType]): Map[String, Option[Any]]

    Permalink
  25. val treatEmptyValuesAsNulls: Option[Boolean]

    Permalink

    Empty cells are parsed as null values (default: true).

  26. val useHeader: Boolean

    Permalink

    If true, the first row of the excel sheet specifies the column names.

    If true, the first row of the excel sheet specifies the column names. This option is required (default: true).

  27. def validate(id: DataObjectId): Unit

    Permalink
  28. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped