Hbase读取高负载
我正在研究满足我们公司需求的 noSQL 解决方案。 目前搜索范围缩小到 hBase。我读了很多关于架构、性能等方面的知识,但有一件事我仍然没有发现。
例如,如果您有 100 个节点的集群,并且一行同时收到 100.000 个请求。在这种情况下,所有 100.000 个请求将仅命中一个节点,该行存储在哪里?据我了解,HBase复制仅用于数据备份(不用于读取负载平衡),并且没有任何主/从机制(如MySQL)?
I'm in research process for noSQL solution for our company needs.
For now the search narrows to hBase. I've read a lot about architecture, performance etc, but one thing is still uncovered for me.
For example if you have 100 nodes cluster, and one row gets 100.000 simultaneous requests. In this case all the 100.000 requests will hit only one node, where the row is stored? As I understand HBase replication is only for data backup (not for read load balance), and there no any master/slave mechanism (like in MySQL)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
关于单行的 100.000 个并发请求 - 我认为目前没有人适合这个。在正常情况下,根本不需要 - 客户端无论如何都与数据库隔离,因此在这种情况下访问受到限制(并且可能被缓存)。
关于存储和复制。首先,至少有两种类型的复制,实际上它不是HBase。 HBase 依赖于 HDFS,HDFS 本质上是容错的。如果您需要了解详细信息,请阅读
HBase master
和HBase Region server
角色,但一般来说,与复制相关的所有内容都会转到 HDFS。Regarding to 100.000 concurrent requests for single row - I think nobody is good for this currently. Under normal condition it is simply not needed - clients are anyway isolated from DB so access is limited in this case (and probably cached).
Regarding to storage and replication. First, there is at least 2 types of replication and actually it is not HBase. HBase relies on HDFS which is fault tolerant by nature. Read about
HBase master
andHBase region server
role if you need to understand details but in general all things related to replication go to HDFS.我猜 100,000 个并发请求在 HBase 上不会很好地工作,但是现实世界的场景似乎工作得很好
yfrog 每秒获得 10K 请求 和 eBay 选择它作为新版本的产品搜索引擎以及Facebook 的消息系统
您还可以查看更适度集群上的 hstack 基准测试
I guess 100,000 concurrent request will not work very well on HBase, however real world scenarios seems to work quite well
yfrog get 10K request per second and eBay chose it for the new version of their product search engine as well as Facebook for their messaging system
You can also take a look at hstack benchmarks on more modest cluster
HBase复制不仅是为了数据备份,也是为了可用性。由于这似乎并不是您在此处提出问题的唯一要点,因此我向您指出了该链接,您可以在其中找到更多信息。如果您对架构设计有具体问题,您应该首先从 Apache 托管项目的主页开始。对于关于主/从的最后一个问号,我发送的 URL 仍然适用(如果您不确定,您可以向 HBase 开发人员询问): http://hbase.apache.org/replication.html
HBase replication is not only for data backup, also availability. As that does not seem to be the only point you cover with your question here I pointed you to that link where you can find more information. If you have specific questions regarding your schema design you should start in the home page of the Apache hosted project first of all. For the last question mark about master/slave, that URL I sent still applies (And you can ask the HBase developers about it if you are unsure anyway): http://hbase.apache.org/replication.html