Cassandra = 单个节点上单行上的列更新的原子性/隔离性？

发布于 2024-11-08 00:00:54 字数 1782 浏览 2 评论 0原文

抱歉，不得不再次向 Cassandra 询问一些事情，我非常感谢您的建议：

我已阅读以下内容： http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic 完全迷失并想知道：

Cassandra 中的 ONE 上写的是真的吗？一个 ROW-KEY 的 SIGLE NODE（使用 batch_mutate 在同一列中更新许多列）不会与同一节点上的 READ 隔离到相同的行键的列，保证读取没有准备好“部分更改的数据”？示例：

Current Status:     [KEY=1 , ColumnName=A with Value=A , ColumnName=B with Value=B] on Node 1
Client A => Writes: [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=D] on Node 1

原子性：

根据 cassandra 文档，对于执行写入的客户端来说，写入是原子的：上面的写入要么完全成功，要么完全失败！？像这样的东西 [KEY=1，ColumnName=A，Value=C，ColumnName=B，Value=B]（=一半列更新成功，但另一半尚未应用/失败）在发生错误时不能是写入的结果吗？这是正确的吗？

隔离：

即使在一个节点（此处为节点 1）上，对于在同一节点上读取同一行的人来说，写入也不会被隔离，这真的是真的吗？如上所述，如果客户端 A 更新了一半要更改的列（此处 ColumnName=A 且 Value=C ），真的吗，另一个连接节点 1 的客户端 B 确实会看到记录为

Client B => Reads:  [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=B] on Node 1

几毫秒后，再次读取它会看到？

Client B => Reads:  [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=D] on Node 1

。

为什么更新不按节点隔离？

对我来说这似乎很简单而且便宜？为什么节点 1 上没有持有内存锁，KEY=1 当前正在更新，因此读取可以等待完成此写入？（这只是一个非常小的开销，因为锁本地保存在 Node1 的内存中，并且可以配置为读取客户端可以接受“锁”或只是读取脏值？那么它类似于“可配置的隔离级别”？如果我需要高性能，我会忽略锁/禁用它们，如果我需要在每个节点上进行隔离并接受负面性能影响，然后我等待内存锁（在节点 1 上）被释放？（请注意，我不是在谈论聚集/分布式锁，而是在单个机器上保证写入在每个行键的基础上隔离的锁！）

或者“更改现有列”与操作之间的隔离是否不同“追加/添加列”。因此，Chaing 列（如上面的示例中是隔离的）但添加新列不是隔离的。从我的角度来看，更改现有列必须是隔离/原子的......添加列并不需要太多隔离......

我问的问题是：如果上面描述的事情可能发生，那读起来真的很读部分更改的记录，什么那么用例对于 nosql/cassandra 来说是合法的吗？这意味着任何类型的随机列数据都可以作为列存在于每行基础上可能处于任何随机读/写状态？我几乎不知道有任何数据和用例允许在每行上“任意”更改基础。

非常感谢！！！詹斯

原文

sorry having to ask again something to Cassandra again and I would very much appreciate your adivce:

I have read this: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic and are comletely lost and wondering:

Is it really true, that in Cassandra WRITES on ONE SIGLE NODE for ONE ROW-KEY (with MANY COLUMNS to be updated in the SAME COLUMNFAMILY, using batch_mutate) are NOT ISOLATED against a READ on the SAME NODE
to the SAME ROW-KEY'S COLUMNS, guaranteeing that a read does not ready "partly changed data"? Example:

Current Status:     [KEY=1 , ColumnName=A with Value=A , ColumnName=B with Value=B] on Node 1
Client A => Writes: [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=D] on Node 1

ATOMICITY:

According to the cassandra docs, Writes are Atomic for the Client doing the write:
The write above will either be completely successfully or fail completely!?
Something like
[KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=B] (=half of the column updates succeeded,
but the other half was not yet applied/faild) can not be the RESULT of the Write in case of an error?
Is this correct?

ISOLATION:

Is it really true, that even on ONE SINGLE NODE (here Node 1) writes are not isolated for someone reading the same ROW on the same Node?
As desctribed above, if Client A has updated half of its columns to be changed (here ColumnName=A with Value=C ),
is it really true, that another Client B connectiong the Node 1 will then indeed see the record as

Client B => Reads:  [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=B] on Node 1

And some milliseconds later,reading again it will see ?

Client B => Reads:  [KEY=1 , ColumnName=A with Value=C , ColumnName=B with Value=D] on Node 1

Why are Updates not isolated on a per Node Basis?

For mee this seems to be quite easy and cheap?
Why is there no in memory lock held on Node 1, that KEY=1 is currently in the process of being updated so a read can wait to finish this write?
(This would be only a very small overhead, as the lock is locally held in memory on the Node1, and could be configured that the Reading client can accept the "lock" or simply read a dirty value?
So it is something like a "Configurable Isolation Level"? If I need high performance I ignore locks/disable them and if I need isolation on a per node basis and accept the
negative performance impact, then I waite for the in memory lock (on node 1) to be released? (Note, I am not talking about culstered/distributed-locks, but locks that guarantee on one single maching that a write is isolated on a per row-key basis!)

Or is Isolation different in regard to "changing existing columns" versus operations that "append/add columns". So that chaing columsn (as in the example above are isolated) but adding new columsn is not isolated. From my point of view, changing existing columns must be isolated/atomic.... Adding columsn is not su much required to be isolated...

The question why I am asking: If things like depicated above can happen, that reads really read partitially changed records, what
usecases are then legitimate for nosql/cassandra? This means any kind of random column data can exist on a per row basis as the columsn
might be in any random read/write state? I hardly know of any data and use case that is allowd to be changed "arbitrarily" on per row
basis.

Thank you very much!!!
jens

分享到QQ

分享到微博