Hbase性能
我正在使用 Spring + Datanucleus JDO + Hbase。 Hbase采用完全分布式模式,有两个节点。我在这里面临严重的性能问题。
我的 web 应用程序可以被视为一个 pinger,它只是不断 ping URL 并存储它们的响应。因此,我的应用程序运行多个线程来插入数据库。我观察到,一旦并发写入数量超过 20 左右,插入就会开始花费大量时间(有些甚至需要 1000 秒)。当这种情况发生时,READS 也会开始失败,并且我的 web 应用程序无法从数据库中提取任何数据(我的 web 应用程序挂起)。我不太喜欢 NoSQL 数据库,因此不知道从哪里开始寻找性能。
我的主要配置是: Zookeeper 法定人数大小:1 Hbase 区域服务器:2 数据节点:2 hbase.zookeeper.property.maxClientCnxns:400 复制因子:3
我需要增加 Hbase 的堆大小吗?高写入吞吐量是否会对读取产生影响?
我的配置有问题吗?似乎写入文件比将数据写入 Hbase 更快。这是我在 Hbase 的最后一次尝试。请帮忙
I am using Spring + Datanucleus JDO + Hbase. Hbase is on a fully distributed mode with two nodes. I am facing serious performance issues here.
My webapp can be considered as a pinger which just keeps pinging URLS and stores their response. Hnce my app runs multiple threads for INSERT into db. I have observed that once the number of concurrent writes exceeds around 20 , the inserts start taking a lot of time (some take even 1000 secs). And when this happens READS start failing too and my webapp is not able to extract any data from the db (my webapp hangs). I am not much of a NoSQL db guy and hence do not know where to start looking for performance.
My major configurations are:
Zookeeper quorum size: 1
Hbase regionservers: 2
Data Nodes: 2
hbase.zookeeper.property.maxClientCnxns: 400
replication factor:3
Do I need to increase the heap size for Hbase ? Should a high WRITE throughput have effect on READ ?
Am I doing something wrong with the configuration? It seems writing to a file would be faster that writing data to Hbase . This is my last shot at Hbase. Please help
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我看到的一个大问题是,您在 2 个节点上运行 HBase,复制因子为 3(实际上只有 2,因为只有 2 个节点可供复制)。这意味着所有写入都必须复制到两个节点。 HBase 确实需要至少 5 个左右的节点才能运行。
听起来你正在填满你的第一个区域并且它正在分裂,在分裂期间一旦 MemStore 填满你就会开始阻塞。您应该考虑将表预先拆分为多个区域,以便均匀分布写入。
我建议您查看HBase 书中有关性能的章节,特别是关于性能的部分预分割表。
您还应该使用压缩,确保本地压缩正常工作(gzip 、 lzo 或 snappy) - 不要使用纯 Java 压缩,否则你会非常慢,链接对此进行了一些讨论。
The big problem that I see is you are running HBase on 2 nodes with a replication factor of 3 (actually in effect just 2 as there are only 2 nodes to replicate to). This means all writes must be replicated to both nodes. HBase really needs at least 5 or so nodes to get going.
It sounds like you are filling up your first region and it is splitting, during the split once the MemStore fills up you will start blocking. You should look into creating your table pre-split into multiple regions that will give you an even distribution of writes.
I recommend taking a look at the HBase book's chapter on performance, specifically the part on pre-splitting tables.
You should also use compression, make sure you get native compression working (gzip, lzo or snappy) - don't use the pure Java compression otherwise you'll be really really slow, the link discusses that a bit.
如果您要使用多个线程写入 HBase,则需要确保尽可能频繁地重用 HBaseConfiguration。否则,每个线程都会创建一个新连接,ZK 最终将停止提供连接,直到旧连接关闭。
一个快速的解决方案是让单例处理将配置传递给 HTable 对象。这应该保证使用相同的配置,并将最大限度地减少并发连接。
If you're going to write to HBase using multiple threads, you need to make sure you are reusing your HBaseConfiguration as often as possible. Otherwise, each thread makes a new connection and ZK will eventually stop offering connections until old ones close.
A quick solution is to let a singleton handle passing the configuration to your HTable objects. This should guarantee the same configuration is used and will minimize your concurrent connections.