Class

com.coxautodata.waimak.rdbm.ingestion

RDBMExtractionTableConfig

Related Doc: package ingestion

Permalink

case class RDBMExtractionTableConfig(tableName: String, pkCols: Option[Seq[String]] = None, lastUpdatedColumn: Option[String] = None, maxRowsPerPartition: Option[Int] = None) extends Product with Serializable

Table configuration used for RDBM extraction

tableName

The name of the table

pkCols

Optionally, the primary key columns for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)

lastUpdatedColumn

Optionally, the last updated column for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)

maxRowsPerPartition

Optionally, the maximum number of rows to be read per Dataset partition for this table This number will be used to generate predicates to be passed to org.apache.spark.sql.SparkSession.read.jdbc If this is not set, the DataFrame will only have one partition. This could result in memory issues when extracting large tables. Be careful not to create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems. You can also control the maximum number of jdbc connections to open by limiting the number of executors for your application.

Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RDBMExtractionTableConfig
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RDBMExtractionTableConfig(tableName: String, pkCols: Option[Seq[String]] = None, lastUpdatedColumn: Option[String] = None, maxRowsPerPartition: Option[Int] = None)

    Permalink

    tableName

    The name of the table

    pkCols

    Optionally, the primary key columns for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)

    lastUpdatedColumn

    Optionally, the last updated column for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)

    maxRowsPerPartition

    Optionally, the maximum number of rows to be read per Dataset partition for this table This number will be used to generate predicates to be passed to org.apache.spark.sql.SparkSession.read.jdbc If this is not set, the DataFrame will only have one partition. This could result in memory issues when extracting large tables. Be careful not to create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems. You can also control the maximum number of jdbc connections to open by limiting the number of executors for your application.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  8. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  9. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  10. val lastUpdatedColumn: Option[String]

    Permalink

    Optionally, the last updated column for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)

  11. val maxRowsPerPartition: Option[Int]

    Permalink

    Optionally, the maximum number of rows to be read per Dataset partition for this table This number will be used to generate predicates to be passed to org.apache.spark.sql.SparkSession.read.jdbc If this is not set, the DataFrame will only have one partition.

    Optionally, the maximum number of rows to be read per Dataset partition for this table This number will be used to generate predicates to be passed to org.apache.spark.sql.SparkSession.read.jdbc If this is not set, the DataFrame will only have one partition. This could result in memory issues when extracting large tables. Be careful not to create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems. You can also control the maximum number of jdbc connections to open by limiting the number of executors for your application.

  12. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  13. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  14. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  15. val pkCols: Option[Seq[String]]

    Permalink

    Optionally, the primary key columns for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)

  16. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  17. val tableName: String

    Permalink

    The name of the table

  18. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped