与 Cassandra 数据模型的事务
根据CAP理论,Cassandra只能具有最终一致性。更糟糕的是,如果我们在一个请求中进行多次读写而没有进行适当的处理,我们甚至可能会失去逻辑一致性。换句话说,如果我们做事快,我们可能会做错。
同时,为 Cassandra 设计数据模型的最佳实践是考虑我们将要进行的查询,然后向其添加 CF。这样,在许多情况下,添加/更新一个实体意味着更新许多视图/CF。如果没有原子事务功能,就很难做到正确。但有了它,我们又失去了A和P部分。
我认为这并没有引起很多人的关注,因此我想知道为什么。
- 这是因为我们总能找到一种方法来设计数据模型,以避免在一个会话中进行多次读写?
- 这是因为我们可以忽略“正确”的部分吗?
- 在实际实践中,我们是否总是在中间的某个地方拥有 ACID 功能?我的意思是也许在应用程序层实现或者添加一个中间件来处理它?
According to the CAP theory, Cassandra can only have eventually consistency. To make things worse, if we have multiple reads and writes during one request without proper handling, we may even lose the logical consistency. In other words, if we do things fast, we may do it wrong.
Meanwhile the best practice to design the data model for Cassandra is to think about the queries we are going to have, and then add a CF to it. In this way, to add/update one entity means to update many views/CFs in many cases. Without atomic transaction feature, it's hard to do it right. But with it, we lose the A and P parts again.
I don't see this concerns many people, hence I wonder why.
- Is this because we can always find a way to design our data model to avoid to do multiple reads and writes in one session?
- Is this because we can just ignore the 'right' part?
- In real practice, do we always have ACID feature somewhere in the middle? I mean maybe implement in application layer or add a middleware to handle it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
它确实涉及人们,但想必您正在使用 cassandra,因为由于扩展或可靠性问题,单个数据库服务器无法满足您的需求。因此,您被迫解决分布式系统的限制。
不,你通常不会在其他地方有酸,因为想必其他地方也必须分布在多台机器上。相反,您可以围绕分布式系统的限制来设计应用程序。
如果您要更新多个列以满足查询,您可以查看最终原子本演示文稿中的部分提供了有关如何做到这一点的想法。基本上,您在编写之前已经编写了有关 cassandra 更新的足够信息。这样如果写入失败,您可以稍后重试。
如果您可以使用 Zookeeper 或 cages 可能有用。
It does concern people, but presumably you are using cassandra because a single database server is unable to meet your needs due to scaling or reliability concerns. Because of this, you are forced to work around the limitations of a distributed system.
No, you don't usually have acid somewhere else, as presumably that somewhere else must be distributed over multiple machines as well. Instead, you design your application around the limitations of a distributed system.
If you are updating multiple columns to satisfy queries, you can look at the eventually atomic section in this presentation for ideas on how to do that. Basically you write enough info about your update to cassandra before you do your write. That way if the write fails, you can retry it later.
If you can structure your application in such a way, using a co-ordination service like Zookeeper or cages may be useful.