引用完整性和 HBase
您在 HBase 常见问题解答中读到的第一个示例模式是多对多关系的学生课程示例。该架构在 Student 表中有一个 Courses 列,在 Course 表中有一个 Students 列。
但我不明白在 HBase 中如何保证这两个对象之间的完整性。如果在更新一个表和更新另一个表之间发生崩溃,我们就会遇到问题。
我看到有一个交易工具,但是在每个看跌期权上使用它的成本是多少?或者还有其他思考问题的方式吗?
One of the first sample schemas you read about in the HBase FAQ is the Student-Course example for a many-many relationship. The schema has a Courses column in the Student table and a Students column in the Course table.
But I don't understand how in HBase you guarantee integrity between these two objects. If something were to crash between updating one table and before another, we'd have a problem.
I see there is a transaction facility, but what is the cost of using this on what might be every Put? Or are there other ways to think about the problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我们遇到了同样的问题。
我为 hbase 开发了一个商业插件,可以处理您提到的事务和关系问题。具体来说,我们将 DataNucleus 用于 JDO 兼容环境。我们的插件在此页面列出 http://www.datanucleus.org/products/accessplatform_3_0/ datastores.html 或者您可以直接访问我们的小博客 http://www.inciteretail。 com/?page_id=236。
我们利用 JTA 进行交易服务。因此,在您的情况下,我们将处理关系问题以及索引表的任何插入(很难有一个没有索引查找和排序的应用程序!)。
We hit the same issue.
I have developed a commercial plugin for hbase that handles transactions and the relationship issues that you mention. Specifically, we utilize DataNucleus for a JDO Compliant environment. Our plugin is listed on this page http://www.datanucleus.org/products/accessplatform_3_0/datastores.html or you can go directly to our small blog http://www.inciteretail.com/?page_id=236.
We utilize JTA for our transaction service. So in your case, we would handle the relationship issue and also any inserts for index tables (Hard to have an app without index lookup and sorting!).
如果没有额外的日志,您将无法保证这两个对象之间的完整性。 HBase 仅具有行级别的原子更新。不过,您可能可以使用该属性来创建可以在失败后恢复的 Tx 日志。
Without an additional log you won't be able to guarantee integrity between these two objects. HBase only has atomic updates at the row level. You could probably use that property though to create a Tx log that could recover after a failure.
如果您必须将两个 INSERT 作为单个工作单元执行,则意味着您必须使用事务管理器来保留 ACID 属性。据我所知,没有其他方法可以思考这个问题。
与引用完整性相比,成本不太值得关注。正确编码,不用担心性能。您的代码将是寻找性能问题的第一个地方,而不是事务管理器。
If you have to perform two INSERTs as a single unit of work, that means you have to use a transaction manager to preserve ACID properties. There's no other way to think about the problem that I know of.
The cost is less of a concern that referential integrity. Code it properly and don't worry about performance. Your code will be the first place to look for performance problems, not the transaction manager.
逻辑关系模型使用两种主要的关系类型:一对多和
多对多。关系数据库将前者直接建模为外键(无论是
由数据库显式强制作为约束,或由您隐式引用
应用程序作为查询中的连接列),后者作为连接表(附加
表,其中每一行代表两个主要关系之间关系的一个实例
表)。 HBase 中没有这些的直接映射,并且通常归结为 de-
标准化数据。
首先要注意的是,HBase 没有任何内置的连接或约束,
对于显式关系几乎没有什么用处。您可以轻松地放置一对一的数据
许多本质上进入HBase表:。但
这只是前一个表中行的某些部分碰巧存在的关系
对应于后表中的部分行键。 HBase 对这种关系一无所知
因此,由您的应用程序使用它来执行操作(如果有的话)。
Logical relational models use two main varieties of relationships: one-to-many and
many-to-many. Relational databases model the former directly as foreign keys (whether
explicitly enforced by the database as constraints, or implicitly referenced by your
application as join columns in queries) and the latter as junction tables (additional
tables where each row represents one instance of a relationship between the two main
tables). There is no direct mapping of these in HBase, and often it comes down to de-
normalizing the data.
The first thing to note is that HBase, not having any built-in joins or constraints,
has little use for explicit relationships. You can just as easily place data that is one-to-
many in nature into HBase tables:. But
this is only a relationship in that some parts of the row in the former table happen to
correspond to parts of rowkeys in the latter table. HBase knows nothing of this rela-
tionship, so it’s up to your application to do things with it (if anything).