org.apache.hadoop.hbase.client
Class HTableUtil

java.lang.Object
  extended by org.apache.hadoop.hbase.client.HTableUtil

@InterfaceAudience.Public
@InterfaceStability.Stable
public class HTableUtil
extends Object

Utility class for HTable.


Constructor Summary
HTableUtil()
           
 
Method Summary
static void bucketRsBatch(HTable htable, List<Row> rows)
          Processes a List of Rows (Put, Delete) and writes them to an HTable instance in RegionServer buckets via the htable.batch method.
static void bucketRsPut(HTable htable, List<Put> puts)
          Processes a List of Puts and writes them to an HTable instance in RegionServer buckets via the htable.put method.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HTableUtil

public HTableUtil()
Method Detail

bucketRsPut

public static void bucketRsPut(HTable htable,
                               List<Put> puts)
                        throws IOException
Processes a List of Puts and writes them to an HTable instance in RegionServer buckets via the htable.put method. This will utilize the writeBuffer, thus the writeBuffer flush frequency may be tuned accordingly via htable.setWriteBufferSize.

The benefit of submitting Puts in this manner is to minimize the number of RegionServer RPCs in each flush.

Assumption #1: Regions have been pre-created for the table. If they haven't, then all of the Puts will go to the same region, defeating the purpose of this utility method. See the Apache HBase book for an explanation of how to do this.
Assumption #2: Row-keys are not monotonically increasing. See the Apache HBase book for an explanation of this problem.
Assumption #3: That the input list of Puts is big enough to be useful (in the thousands or more). The intent of this method is to process larger chunks of data.
Assumption #4: htable.setAutoFlush(false) has been set. This is a requirement to use the writeBuffer.

Parameters:
htable - HTable instance for target HBase table
puts - List of Put instances
Throws:
IOException - if a remote or network exception occurs

bucketRsBatch

public static void bucketRsBatch(HTable htable,
                                 List<Row> rows)
                          throws IOException
Processes a List of Rows (Put, Delete) and writes them to an HTable instance in RegionServer buckets via the htable.batch method.

The benefit of submitting Puts in this manner is to minimize the number of RegionServer RPCs, thus this will produce one RPC of Puts per RegionServer.

Assumption #1: Regions have been pre-created for the table. If they haven't, then all of the Puts will go to the same region, defeating the purpose of this utility method. See the Apache HBase book for an explanation of how to do this.
Assumption #2: Row-keys are not monotonically increasing. See the Apache HBase book for an explanation of this problem.
Assumption #3: That the input list of Rows is big enough to be useful (in the thousands or more). The intent of this method is to process larger chunks of data.

This method accepts a list of Row objects because the underlying .batch method accepts a list of Row objects.

Parameters:
htable - HTable instance for target HBase table
rows - List of Row instances
Throws:
IOException - if a remote or network exception occurs


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.