RDBMExtractionTableConfig

Table configuration used for RDBM extraction

tableName: The name of the table
pkCols: Optionally, the primary key columns for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)
lastUpdatedColumn: Optionally, the last updated column for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)
maxRowsPerPartition: Optionally, the maximum number of rows to be read per Dataset partition for this table This number will be used to generate predicates to be passed to org.apache.spark.sql.SparkSession.read.jdbc If this is not set, the DataFrame will only have one partition. This could result in memory issues when extracting large tables. Be careful not to create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems. You can also control the maximum number of jdbc connections to open by limiting the number of executors for your application.

Linear Supertypes

Serializable, Serializable, Product, Equals, AnyRef, Any

Instance Constructors

new RDBMExtractionTableConfig(tableName: String, pkCols: Option[Seq[String]] = None, lastUpdatedColumn: Option[String] = None, maxRowsPerPartition: Option[Int] = None)

tableName
The name of the table
pkCols
Optionally, the primary key columns for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)
lastUpdatedColumn
Optionally, the last updated column for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)
maxRowsPerPartition
Optionally, the maximum number of rows to be read per Dataset partition for this table This number will be used to generate predicates to be passed to org.apache.spark.sql.SparkSession.read.jdbc If this is not set, the DataFrame will only have one partition. This could result in memory issues when extracting large tables. Be careful not to create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems. You can also control the maximum number of jdbc connections to open by limiting the number of executors for your application.

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
val lastUpdatedColumn: Option[String]

Optionally, the last updated column for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)
val maxRowsPerPartition: Option[Int]

Optionally, the maximum number of rows to be read per Dataset partition for this table This number will be used to generate predicates to be passed to org.apache.spark.sql.SparkSession.read.jdbc If this is not set, the DataFrame will only have one partition.
Optionally, the maximum number of rows to be read per Dataset partition for this table This number will be used to generate predicates to be passed to org.apache.spark.sql.SparkSession.read.jdbc If this is not set, the DataFrame will only have one partition. This could result in memory issues when extracting large tables. Be careful not to create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems. You can also control the maximum number of jdbc connections to open by limiting the number of executors for your application.
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val pkCols: Option[Seq[String]]

Optionally, the primary key columns for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
val tableName: String

The name of the table
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package ingestion

case class RDBMExtractionTableConfig(tableName: String, pkCols: Option[Seq[String]] = None, lastUpdatedColumn: Option[String] = None, maxRowsPerPartition: Option[Int] = None) extends Product with Serializable

Instance Constructors

new RDBMExtractionTableConfig(tableName: String, pkCols: Option[Seq[String]] = None, lastUpdatedColumn: Option[String] = None, maxRowsPerPartition: Option[Int] = None)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def finalize(): Unit

final def getClass(): Class[_]

final def isInstanceOf[T0]: Boolean

val lastUpdatedColumn: Option[String]

val maxRowsPerPartition: Option[Int]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val pkCols: Option[Seq[String]]

final def synchronized[T0](arg0: ⇒ T0): T0

val tableName: String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped