如何设计Hbase架构?
假设我有这个 RDBM 表(Entity-attribute-value_model)
col1: entityID
col2: attributeName
col3: value
:由于扩展问题,我想使用 HBase。
我知道访问 Hbase 表的唯一方法是使用主键(游标)。 您可以获得特定键的游标,并逐一迭代行。
问题是,就我而言,我希望能够迭代所有 3 列。 例如:
- 对于给定的entityID,我想获取它的所有属性和值,
- 对于给定的attributeName和值,我想获取所有entitiIDS ...
所以我的一个想法是构建一个Hbase表来保存数据(表DATA,以entityID作为主索引),以及2个“索引”表,一个以attributeName作为主键,另一个以值
每个索引表将保存 DATA 表的指针(entityID)列表。
这是一个合理的做法吗? 或者是对 Hbase 概念的“滥用”?
HBase 允许通过主数据库进行获取操作 键并扫描(认为:光标)行 范围。 (如果你既有规模又有 需要二级索引,不用担心 - Lucene 来救援! 但这是另一篇文章了。)
您知道 Lucene 如何提供帮助吗?
——尤纳坦
suppose that I have this RDBM table (Entity-attribute-value_model):
col1: entityID
col2: attributeName
col3: value
and I want to use HBase due to scaling issues.
I know that the only way to access Hbase table is using a primary key (cursor). you can get a cursor for a specific key, and iterate the rows one-by-one .
The issue is, that in my case, I want to be able to iterate on all 3 columns.
for example :
- for a given an entityID I want to get all its attriutes and values
- for a give attributeName and value I want to all the entitiIDS
...
so one idea I had is to build one Hbase table that will hold the data (table DATA, with entityID as primary index), and 2 "index" tables one with attributeName as a primary key, and the other one with value
each index table will hold a list of pointers (entityIDs) for the DATA table.
Is it a reasonable approach ? or is is an 'abuse' of Hbase concepts ?
HBase allows get operations by primary
key and scans (think: cursor) over row
ranges. (If you have both scale and
need of secondary indexes, don’t worry
- Lucene to the rescue! But that’s another post.)
Do you know how Lucene can help ?
-- Yonatan
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
二级索引确实对于 HBase 的许多潜在应用很有用,我相信开发人员实际上正在考虑它。 查看 http://www.mail-archive.com /[电子邮件受保护]/msg04801.html。
与此同时,如果您的应用程序数据存储可以建模为星型模式(请参阅 http:// en.wikipedia.org/wiki/Star_schema)您可能想查看 Hypertable 针对二级索引类型需求提出的解决方案 http://markmail.org/message/rphm4q6cbar2ycgp
Secondary indexes would indeed be useful for many potential applications of HBase, and I believe the developers are in fact looking at it. Checkout http://www.mail-archive.com/[email protected]/msg04801.html.
In the mean time though, if your application data storage can be modelled as a star schema (see http://en.wikipedia.org/wiki/Star_schema) you might like to checkout the solution that Hypertable proposes for secondary index-type needs http://markmail.org/message/rphm4q6cbar2ycgp
我建议使用两种不同的平面表:一种用于查找给定实体 ID 的属性+值,另一种用于查找给定属性+值的实体 ID。
表 1 如下所示:
表 2:
I recommend having two different flat tables: one for looking up attributes+values given entityID, and one for looking up the entityID given attributes+values.
Table 1 would look like this:
and Table 2: