HBase Performance Optimization

Please refer -

First blog in series to reduce Regions on Region Server - https://techdevins.blogspot.com/2023/03/hbase-utility-merging-regions-in-hbase.html

Second to delete column's in HBase - https://techdevins.blogspot.com/2019/11/hbase-bulk-delete-column-qualifiers.html

In this article, we would discuss options to further optimize HBase.

We could use COMPRESSION=>'SNAPPY' for Column families. And, invoke Major Compaction right after setting the property. This will reduce size of tables by 70% yet giving same read & write performance. Once size of regions & tables is compressed then we can re invoke the Merge Region utility to reduce number of regions per server.

Set Region Split policy as - SPLIT_POLICY=>'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'

Analytical workloads creates huge and uncontrolled traffic on the cluster in terms of request per seconds. One possible way to address this issue is to throttle analytical jobs so that real-time workloads are less affected.
HBase has Request Throttling (https://issues.apache.org/jira/browse/HBASE-11598) feature that can be used to give more priority to Real Time Calls over Analytical Calls. HBase typically treats all requests identically; however, the new throttling feature can be used to specify a maximum rate or bandwidth to override this behavior. The limit may be applied to a requests originating from a particular user, or alternatively, to requests directed to a given table or a specified namespace. For example -

hbase> set_quota TYPE => THROTTLE, USER => 'uname', LIMIT => '100req/sec'
hbase> set_quota TYPE => THROTTLE, TABLE => 'tbl', LIMIT => '10M/sec'
hbase> set_quota TYPE => THROTTLE, NAMESPACE => 'ns', LIMIT => 'NONE'

Note - The related parameter hbase.quota.refresh.period specifies the time interval in milliseconds that that regionserver should re-check for any new restrictions that have been added.

Tech Devins