Cassandra = 单个节点上单行上的列更新的原子性/隔离性?
抱歉,不得不再次向 Cassandra 询问一些事情,我非常感谢您的建议:
我已阅读以下内容: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic 完全迷失并想知道:
Cassandra 中的 ONE 上写的是真的吗?一个 ROW-KEY 的 SIGLE NODE(使用 batch_mutate 在同一列中更新许多列)不会与同一节点上的 READ 隔离 到相同的行键的列,保证读取没有准备好“部分更改的数据”?示例:
Current Status: [KEY=1 , ColumnName=A with Value=A , ColumnName=B with Value=B] on Node 1
Client A => Writes: [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=D] on Node 1
原子性:
根据 cassandra 文档,对于执行写入的客户端来说,写入是原子的: 上面的写入要么完全成功,要么完全失败!? 像这样的东西 [KEY=1,ColumnName=A,Value=C,ColumnName=B,Value=B]
(=一半列更新成功, 但另一半尚未应用/失败)在发生错误时不能是写入的结果吗? 这是正确的吗?
隔离:
即使在一个节点(此处为节点 1)上,对于在同一节点上读取同一行的人来说,写入也不会被隔离,这真的是真的吗? 如上所述,如果客户端 A 更新了一半要更改的列(此处 ColumnName=A 且 Value=C ), 真的吗,另一个连接节点 1 的客户端 B 确实会看到记录为
Client B => Reads: [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=B] on Node 1
几毫秒后,再次读取它会看到?
Client B => Reads: [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=D] on Node 1
。
为什么更新不按节点隔离?
对我来说这似乎很简单而且便宜? 为什么节点 1 上没有持有内存锁,KEY=1 当前正在更新,因此读取可以等待完成此写入? (这只是一个非常小的开销,因为锁本地保存在 Node1 的内存中,并且可以配置为读取客户端可以接受“锁”或只是读取脏值? 那么它类似于“可配置的隔离级别”?如果我需要高性能,我会忽略锁/禁用它们,如果我需要在每个节点上进行隔离并接受 负面性能影响,然后我等待内存锁(在节点 1 上)被释放? (请注意,我不是在谈论聚集/分布式锁,而是在单个机器上保证写入在每个行键的基础上隔离的锁!)
或者“更改现有列”与操作之间的隔离是否不同“追加/添加列”。因此,Chaing 列(如上面的示例中是隔离的)但添加新列不是隔离的。从我的角度来看,更改现有列必须是隔离/原子的......添加列并不需要太多隔离......
我问的问题是:如果上面描述的事情可能发生,那读起来真的很读部分更改的记录,什么 那么用例对于 nosql/cassandra 来说是合法的吗?这意味着任何类型的随机列数据都可以作为列存在于每行基础上 可能处于任何随机读/写状态?我几乎不知道有任何数据和用例允许在每行上“任意”更改 基础。
非常感谢!!! 詹斯
sorry having to ask again something to Cassandra again and I would very much appreciate your adivce:
I have read this: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic and are comletely lost and wondering:
Is it really true, that in Cassandra WRITES on ONE SIGLE NODE for ONE ROW-KEY (with MANY COLUMNS to be updated in the SAME COLUMNFAMILY, using batch_mutate) are NOT ISOLATED against a READ on the SAME NODE
to the SAME ROW-KEY'S COLUMNS, guaranteeing that a read does not ready "partly changed data"? Example:
Current Status: [KEY=1 , ColumnName=A with Value=A , ColumnName=B with Value=B] on Node 1
Client A => Writes: [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=D] on Node 1
ATOMICITY:
According to the cassandra docs, Writes are Atomic for the Client doing the write:
The write above will either be completely successfully or fail completely!?
Something like[KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=B]
(=half of the column updates succeeded,
but the other half was not yet applied/faild) can not be the RESULT of the Write in case of an error?
Is this correct?
ISOLATION:
Is it really true, that even on ONE SINGLE NODE (here Node 1) writes are not isolated for someone reading the same ROW on the same Node?
As desctribed above, if Client A has updated half of its columns to be changed (here ColumnName=A with Value=C ),
is it really true, that another Client B connectiong the Node 1 will then indeed see the record as
Client B => Reads: [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=B] on Node 1
And some milliseconds later,reading again it will see ?
Client B => Reads: [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=D] on Node 1
.
Why are Updates not isolated on a per Node Basis?
For mee this seems to be quite easy and cheap?
Why is there no in memory lock held on Node 1, that KEY=1 is currently in the process of being updated so a read can wait to finish this write?
(This would be only a very small overhead, as the lock is locally held in memory on the Node1, and could be configured that the Reading client can accept the "lock" or simply read a dirty value?
So it is something like a "Configurable Isolation Level"? If I need high performance I ignore locks/disable them and if I need isolation on a per node basis and accept the
negative performance impact, then I waite for the in memory lock (on node 1) to be released? (Note, I am not talking about culstered/distributed-locks, but locks that guarantee on one single maching that a write is isolated on a per row-key basis!)
Or is Isolation different in regard to "changing existing columns" versus operations that "append/add columns". So that chaing columsn (as in the example above are isolated) but adding new columsn is not isolated. From my point of view, changing existing columns must be isolated/atomic.... Adding columsn is not su much required to be isolated...
The question why I am asking: If things like depicated above can happen, that reads really read partitially changed records, what
usecases are then legitimate for nosql/cassandra? This means any kind of random column data can exist on a per row basis as the columsn
might be in any random read/write state? I hardly know of any data and use case that is allowd to be changed "arbitrarily" on per row
basis.
Thank you very much!!!
jens
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
因为 Cassandra 非常强调性能的非规范化(分布式连接不能扩展,是的,我在这里正确地使用了“扩展”——分布式连接在集群中的机器数量中是 O(N)),因此将卷写入“物化视图”行可能非常高。因此,行级锁定会给许多实际工作负载带来不可接受的争用。
Because Cassandra heavily emphasizes denormalization for performance (distributed joins do not scale, and yes, I'm using "scale" correctly here -- distributed joins are O(N) in the number of machines in the cluster), write volume to a "materialized view" row can be VERY high. So row-level locking would introduce unacceptable contention for many real-world workloads.
您链接到的页面说:
我不确定这样做的原因,但我怀疑所需的锁定太粗略并且会影响性能太多。请记住,在大多数情况下,所有更新都会首先写入提交日志,然后立即写入磁盘上的 SSTable(除非您设置非常低的一致性级别),因此纯粹基于内存的锁不一定有帮助。
在一些用例中,这并不重要:
The page you linked to says:
I'm not sure of the reason for this, but I suspect that the required locking would be too coarse and would affect performance too much. Bear in mind that all updates are written first to a commit log, and then immediately to SSTables on disk in most cases (unless you set a very low consistency level), so purely memory-based locks are not necessarily helpful.
A few use cases where this does not matter:
来自 IRC 日志的聊天:
itissid:
好的,http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic 说
这是一个特例
但是如果我们进行正常的写入,它们会被隔离吗?
索布斯:
列是隔离单元
以上没有什么是孤立的(还)
itissid:
好吧,我的
问题是:
有工作可以隔离对单行
driftx的写入:
1.1 已经完成了
A chat from the IRC log:
itissid:
Ok so http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic says that
its a special case
But if we do a normal write they are isolated?
thobbs:
the column is the unit of isolation
nothing above that is isolated (yet)
itissid:
Ok gotcha
thobbs:
there's work to isolate writes to a single row
driftx:
it's done for 1.1