class HBaseContext extends Serializable with Logging
HBaseContext is a façade for HBase operations like bulk put, get, increment, delete, and scan
HBaseContext will take the responsibilities of disseminating the configuration information to the working and managing the life cycle of Connections.
- Annotations
- @Public()
- Alphabetic
- By Inheritance
- HBaseContext
- Logging
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new HBaseContext(sc: SparkContext, config: Configuration, tmpHdfsConfgFile: String = null)
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- var appliedCredentials: Boolean
- def applyCreds[T](): Unit
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
- val broadcastedConf: Broadcast[SerializableWritable[Configuration]]
-
def
bulkDelete[T](rdd: RDD[T], tableName: TableName, f: (T) ⇒ Delete, batchSize: Integer): Unit
A simple abstraction over the HBaseContext.foreachPartition method.
A simple abstraction over the HBaseContext.foreachPartition method.
It allow addition support for a user to take a RDD and generate delete and send them to HBase. The complexity of managing the Connection is removed from the developer
- rdd
Original RDD with data to iterate over
- tableName
The name of the table to delete from
- f
Function to convert a value in the RDD to a HBase Deletes
- batchSize
The number of delete to batch before sending to HBase
-
def
bulkGet[T, U](tableName: TableName, batchSize: Integer, rdd: RDD[T], makeGet: (T) ⇒ Get, convertResult: (Result) ⇒ U)(implicit arg0: ClassTag[U]): RDD[U]
A simple abstraction over the HBaseContext.mapPartition method.
A simple abstraction over the HBaseContext.mapPartition method.
It allow addition support for a user to take a RDD and generates a new RDD based on Gets and the results they bring back from HBase
- tableName
The name of the table to get from
- rdd
Original RDD with data to iterate over
- makeGet
function to convert a value in the RDD to a HBase Get
- convertResult
This will convert the HBase Result object to what ever the user wants to put in the resulting RDD return new RDD that is created by the Get to HBase
-
def
bulkPut[T](rdd: RDD[T], tableName: TableName, f: (T) ⇒ Put): Unit
A simple abstraction over the HBaseContext.foreachPartition method.
A simple abstraction over the HBaseContext.foreachPartition method.
It allow addition support for a user to take RDD and generate puts and send them to HBase. The complexity of managing the Connection is removed from the developer
- rdd
Original RDD with data to iterate over
- tableName
The name of the table to put into
- f
Function to convert a value in the RDD to a HBase Put
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
- def close(): Unit
- val config: Configuration
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
foreachPartition[T](rdd: RDD[T], f: (Iterator[T], Connection) ⇒ Unit): Unit
A simple enrichment of the traditional Spark RDD foreachPartition.
A simple enrichment of the traditional Spark RDD foreachPartition. This function differs from the original in that it offers the developer access to a already connected Connection object
Note: Do not close the Connection object. All Connection management is handled outside this method
- rdd
Original RDD with data to iterate over
- f
Function to be given a iterator to iterate through the RDD values and a Connection object to interact with HBase
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hbaseMapPartition[K, U](configBroadcast: Broadcast[SerializableWritable[Configuration]], it: Iterator[K], mp: (Iterator[K], Connection) ⇒ Iterator[U]): Iterator[U]
underlining wrapper all mapPartition functions in HBaseContext
-
def
hbaseRDD(tableName: TableName, scans: Scan): RDD[(ImmutableBytesWritable, Result)]
A overloaded version of HBaseContext hbaseRDD that defines the type of the resulting RDD
A overloaded version of HBaseContext hbaseRDD that defines the type of the resulting RDD
- tableName
the name of the table to scan
- scans
the HBase scan object to use to read data from HBase
- returns
New RDD with results from scan
-
def
hbaseRDD[U](tableName: TableName, scan: Scan, f: ((ImmutableBytesWritable, Result)) ⇒ U)(implicit arg0: ClassTag[U]): RDD[U]
This function will use the native HBase TableInputFormat with the given scan object to generate a new RDD
This function will use the native HBase TableInputFormat with the given scan object to generate a new RDD
- tableName
the name of the table to scan
- scan
the HBase scan object to use to read data from HBase
- f
function to convert a Result object from HBase into what the user wants in the final generated RDD
- returns
new RDD with results from scan
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- val job: Job
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
mapPartitions[T, R](rdd: RDD[T], mp: (Iterator[T], Connection) ⇒ Iterator[R])(implicit arg0: ClassTag[R]): RDD[R]
A simple enrichment of the traditional Spark RDD mapPartition.
A simple enrichment of the traditional Spark RDD mapPartition. This function differs from the original in that it offers the developer access to a already connected Connection object
Note: Do not close the Connection object. All Connection management is handled outside this method
- rdd
Original RDD with data to iterate over
- mp
Function to be given a iterator to iterate through the RDD values and a Connection object to interact with HBase
- returns
Returns a new RDD generated by the user definition function just like normal mapPartition
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
- val tmpHdfsConfgFile: String
- var tmpHdfsConfiguration: Configuration
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()