HBase多线程扫描真的很慢

发布于 2024-12-27 17:20:12 字数 508 浏览 5 评论 0原文

我正在使用 HBase 来存储一些时间序列数据。根据 O'Reilly HBase 书中的建议，我使用的行键是带有加盐前缀的数据的时间戳。为了查询这些数据，我生成了多个线程，这些线程在一系列时间戳上实现扫描，每个线程处理特定的前缀。然后将结果放入并发哈希图中。

当线程尝试执行扫描时会出现问题。串行完成时通常需要大约 5600 毫秒的查询在生成 6 个线程（对应于 6 个盐/区域服务器）时需要 40000 到 80000 毫秒。

我尝试使用 HTablePools 来解决我认为 HTable 不是线程安全的问题，但这并没有带来任何更好的性能。

特别是，当我执行这部分代码时，我注意到速度显着减慢：

for(Result res : rowScanner){
//add Result To HashMap

通过日志记录，我注意到每次通过循环的条件时，我都会经历很多秒的延迟。如果我强制线程串行执行，这些延迟就不会发生。

我认为资源锁定存在某种问题，但我只是看不到它。

原文

I'm using HBase to store some time series data. Using the suggestion in the O'Reilly HBase book I am using a row key that is the timestamp of the data with a salted prefix. To query this data I am spawning multiple threads which implement a scan over a range of timestamps with each thread handling a particular prefix. The results are then placed into a concurrent hashmap.

Trouble occurs when the threads attmept to perform their scan. A query that normally takes approximately 5600 ms when done serially takes between 40000 and 80000 ms when 6 threads are spawned (corresponding to 6 salts/region servers).

I've tried to use HTablePools to get around what I thought was an issue with HTable being not thread-safe, but this did not result in any better performance.

in particular I am noticing a significant slow down when I hit this portion of my code:

for(Result res : rowScanner){
//add Result To HashMap

Through logging I noticed that everytime through the conditional of the loop I experienced delays of many seconds. These delays do not occur if I force the threads to execute serially.

I assume that there is some kind of issue with resource locking but I just can't see it.

分享到QQ

分享到微博