Cassandra 中的自读写入一致性
读自己写的一致性是所谓的最终一致性的巨大改进:如果我更改我的个人资料图片,我不在乎其他人是否在一分钟后看到更改,但如果在页面重新加载后我仍然看到,那就看起来很奇怪旧的。
是否可以在 Cassandra 中实现这一目标,而无需在多个节点上进行完整的读取检查?
在读取未指定的数据并且实际读取 n>1 个节点时,使用 ConsistencyLevel.QUORUM 是没问题的。然而,当客户端从与写入相同的节点读取(并且实际上使用相同的连接)时,可能会造成浪费 - 在这种情况下,某些数据库将始终确保返回先前写入的(我的)数据,而不是一些旧的。使用 ConsistencyLevel.ONE 并不能确保这一点并假设它会导致竞争条件。一些测试表明:http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/per-connection-quot-read-after-my-write-quot-consistency-td6018377.html
我对此场景的假设设置是 2 个节点、复制因子 2、读取级别 1、写入级别 1。这会导致最终一致性,但我希望在读取时实现“读你自己的写入”一致性。
在我看来,如果我仅在“我的”数据上保持一致就足够了,那么使用 3 个节点(RF=3、RL=quorum 和 WL=quorum)会导致读取请求的浪费。
// seo:也称为:会话一致性、read-after-my-write 一致性
Read-your-own-writes consistency is great improvement from the so called eventual consistency: if I change my profile picture I don't care if others see the change a minute later, but it looks weird if after a page reload I still see the old one.
Can this be achieved in Cassandra without having to do a full read-check on more than one node?
Using ConsistencyLevel.QUORUM
is fine while reading an unspecified data and n>1 nodes are actually being read. However when client reads from the same node as he writes in (and actually using the same connection) it can be wasteful - some databases will in this case always ensure that the previously written (my) data are returned, and not some older one. Using ConsistencyLevel.ONE
does not ensure this and assuming it leads to race conditions. Some test showed this: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/per-connection-quot-read-after-my-write-quot-consistency-td6018377.html
My hypothetical setup for this scenario is 2 nodes, replication factor 2, read level 1, write level 1. This leads to eventual consistency, but I want read-your-own-writes consistency on reads.
Using 3 nodes, RF=3, RL=quorum and WL=quorum in my opinion leads to wasteful read request if I being consistent only on "my" data is enough.
// seo: also known as: session consistency, read-after-my-write consistency
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
好问题。
我们有 http://issues.apache.org/jira/browse/CASSANDRA-876 打开一段时间来添加这个,但没有人费心完成它,因为
也就是说,如果您有动力提供帮助,在票上询问,我很乐意为您指明正确的方向。
Good question.
We've had http://issues.apache.org/jira/browse/CASSANDRA-876 open for a while to add this, but nobody's bothered finishing it because
That said, if you're motivated to help, ask on the ticket and I'll be happy to point you in the right direction.
我关注 Cassandra 开发已经有一段时间了,但我还没有看到像这样提到的功能。
也就是说,如果您只有 2 个节点且复制因子为 2,我会质疑 Cassandra 是否是最佳解决方案。您最终将在每个节点上获得整个数据集,因此更传统的复制 SQL 设置可能更简单且经过更广泛的测试。 Cassandra 非常有前途,但它仍然只有 0.8.2 版本,并且邮件列表上会定期报告问题。
解决“查看我自己的更新”问题的另一种方法是将结果缓存在靠近客户端的地方,无论是在 Web 服务器、应用程序层还是使用诸如 memcached 之类的东西。
I've been following Cassandra development for a little while and I haven't seen a feature like this mentioned.
That said, if you only have 2 nodes with a replication factor of 2, I would question whether Cassandra is the best solution. You are going to end up with the entire data set on each node, so a more traditional replicated SQL setup might be simpler and more widely tested. Cassandra is very promising but it is still only version 0.8.2 and problems are regularly reported on the mailing list.
The other way to solve the 'see my own updates' problem would be to cache the results somewhere closer to the client, whether in the web server, the application layer, or using something like memcached.