A simple abstraction over the HBaseContext.foreachPartition method.
A simple abstraction over the HBaseContext.foreachPartition method.
It allow addition support for a user to take a RDD and generate delete and send them to HBase. The complexity of managing the Connection is removed from the developer
Original RDD with data to iterate over
The name of the table to delete from
Function to convert a value in the RDD to a HBase Deletes
The number of delete to batch before sending to HBase
A simple abstraction over the HBaseContext.mapPartition method.
A simple abstraction over the HBaseContext.mapPartition method.
It allow addition support for a user to take a RDD and generates a new RDD based on Gets and the results they bring back from HBase
The name of the table to get from
Original RDD with data to iterate over
function to convert a value in the RDD to a HBase Get
This will convert the HBase Result object to what ever the user wants to put in the resulting RDD return new RDD that is created by the Get to HBase
A simple abstraction over the HBaseContext.foreachPartition method.
A simple abstraction over the HBaseContext.foreachPartition method.
It allow addition support for a user to take RDD and generate puts and send them to HBase. The complexity of managing the Connection is removed from the developer
Original RDD with data to iterate over
The name of the table to put into
Function to convert a value in the RDD to a HBase Put
A simple enrichment of the traditional Spark RDD foreachPartition.
A simple enrichment of the traditional Spark RDD foreachPartition. This function differs from the original in that it offers the developer access to a already connected Connection object
Note: Do not close the Connection object. All Connection management is handled outside this method
Original RDD with data to iterate over
Function to be given a iterator to iterate through the RDD values and a Connection object to interact with HBase
underlining wrapper all mapPartition functions in HBaseContext
A overloaded version of HBaseContext hbaseRDD that defines the type of the resulting RDD
A overloaded version of HBaseContext hbaseRDD that defines the type of the resulting RDD
the name of the table to scan
the HBase scan object to use to read data from HBase
New RDD with results from scan
This function will use the native HBase TableInputFormat with the given scan object to generate a new RDD
This function will use the native HBase TableInputFormat with the given scan object to generate a new RDD
the name of the table to scan
the HBase scan object to use to read data from HBase
function to convert a Result object from HBase into what the user wants in the final generated RDD
new RDD with results from scan
A simple enrichment of the traditional Spark RDD mapPartition.
A simple enrichment of the traditional Spark RDD mapPartition. This function differs from the original in that it offers the developer access to a already connected Connection object
Note: Do not close the Connection object. All Connection management is handled outside this method
Original RDD with data to iterate over
Function to be given a iterator to iterate through the RDD values and a Connection object to interact with HBase
Returns a new RDD generated by the user definition function just like normal mapPartition
(Since version ) see corresponding Javadoc for more information.
HBaseContext is a façade for HBase operations like bulk put, get, increment, delete, and scan
HBaseContext will take the responsibilities of disseminating the configuration information to the working and managing the life cycle of Connections.