Cassandra 和 Tombstones:创建行、删除行、重新创建行 = 性能?
有人可以解释一下,以下过程对墓碑有什么影响:
1.)使用键“1”创建“行”(“字段”:用户、密码、日期)
2.)删除键为“1”的“行”
3.)使用键“1”创建“行”(“字段”:用户、密码、登录计数)
该序列在一个线程中按顺序执行(因此,这种情况会以相对较高的“速度”发生=操作之间没有长时间的停顿)。
我的问题:
1.) 这对墓碑的创建有什么影响。在2.)
之后,墓碑被创建/存在。但是,如果在相同的键下再次创建新的(略有更改的行)(在过程步骤 3.)
中,现有的逻辑删除会发生什么情况。 cassandra 可以非常有效地“恢复”墓碑吗?)
2.)与仅非常有针对性地删除“日期”“字段”然后创建“登录计数”相比,上述过程要糟糕得多”字段代替? (它很可能会更高性能。但相反,与简单地删除整行并使用正确的数据从头开始重新创建相比,找出哪些字段已被删除要复杂得多......)
备注/更新:
我真正想做的是将“date”
字段设置为null
。但这在 cassandra 中不起作用。值不允许为空。因此,如果我想将其设置为空,我必须删除它。但我担心这个显式的第二个删除请求会对性能产生负面影响(与仅将其设置为空相比)...并且如上所述,我必须首先找出哪些字段已无效并且最重要的是有一个值(我必须比较该状态的所有属性...)
非常感谢! 马库斯
Could someone please explain, what effect the following process has on tombstones:
1.)Creating a "Row" with Key "1" ("Fields": user, password, date)
2.)Deleting the "Row" with Key "1"
3.)Creating a "Row" with Key "1" ("Fields": user, password,logincount)
The sequence is executed in one thread sequentially (so this happens with a relatively high "speed" = no long pauses between the actions).
My Questions:
1.) What effect does this have on the creation of a tombstone. After 2.)
a tombstone is created/exists. But what happens to the existing tombstone, if the new (slightly changed row) is created again under the same key (in process Step 3.)
). Can cassandra "reanimate" the tombstones very efficiently?)
2.) How much worse is the process described above in comparison to only very targetly deleting the date
"field" and then creating the "logincount
" field instead? (It will most likely be more performant. But on the contrary it is much more complex to find out which fields have been deleted in comparison to just simply delete the whole row and recreate it from scratch with the correct data...)
Remark/Update:
What I actually want to do is, setting the "date"
field to null
. But this does not work in cassandra. Nulls are not allowed for values. So in case I want to set it to null I have to delete it. But I am afraid that this explicit second delete request will have a negative performance impact (compared to just setting it to null)...And as described I have to first find out which fields are nulliefied and foremost had a value (I have to compare all atributes for this state...)
Thank you very much!
Markus
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我想在这里澄清一些迟来的事情。
首先,关于 Theodore 的答案:
1)为了简单起见,所有行内部都有一个墓碑字段,因此当新行与墓碑合并时,它只是变成“包含新数据的行,它还记得它曾经被删除过” X。”因此,在这方面并没有真正的惩罚。
2)以下说法是不正确的:“如果您足够快地创建和删除列值,以至于中间不会发生刷新......墓碑[被]简单地丢弃”;为了正确性,墓碑总是被保留。也许西奥多正在考虑的情况是相反的:如果删除,然后插入一个新的列值,那么新列将替换逻辑删除(就像任何过时的值一样)。这与行的情况不同,因为列是存储的“原子”。
3) 根据 (2),如果随着时间的推移有许多列需要删除,则删除行和插入新行可能会更高效。但对于单个列来说,差异可以忽略不计。
最后,关于泰勒的回答,在我看来,简单地删除有问题的列比将其值更改为空的[字节]字符串更惯用。
I would like to belatedly clarify some things here.
First, with respect to Theodore's answer:
1) All rows have a tombstone field internally for simplicity, so when the new row is merged with the tombstone, it just becomes "row with new data, that also remembers that it was once deleted at time X." So there is no real penalty in that respect.
2) It is incorrect to say that "If you create and delete a column value rapidly enough that no flush takes place in the middle... the tombstone [is] simply discarded"; tombstones are always persisted, for correctness. Perhaps the situation Theodore was thinking was the other way around: if you delete, then insert a new column value, then the new column replaces the tombstone (just as it would any obsolete value). This is different from the row case since the Column is the "atom" of storage.
3) Given (2), the delete-row-and-insert-new-one is likely to be more performant if there are many columns to be deleted over time. But for a single column the difference is negligible.
Finally, regarding Tyler's answer, in my opinion it is more idiomatic to simply delete the column in question than to change its value to an empty [byte]string.
1)。如果删除整个行,则逻辑删除仍会保留,并且不会因步骤 3 中的后续插入而重新激活。这是因为很久以前可能已插入该行(例如步骤0:键“1”,字段“名称”)。行“1”键“name”需要保持删除状态,而行“1”键“user”则重新激活。
2)。如果您创建和删除列值的速度足够快,中间不会发生刷新,则不会对性能产生影响。该列将在 Memtable 中就地更新,并且墓碑将被简单地丢弃。只有单个值最终会被持久写入 SSTable。
但是,如果在步骤 2 和 3 之间将 Memtable 刷新到磁盘,则逻辑删除将被写入生成的 SSTable。随后的刷新会将新值写入下一个 SSTable。这将使后续读取速度变慢,因为现在需要从两个 SSTable 中读取该列并进行协调。 (如果在步骤 1 和 2 之间发生冲洗,则类似。)
1). If you delete the whole row, then the tombstone is still kept and not reanimated by the subsequent insertion in step 3. This is because there may have been an insertion for the row a long time ago (e.g. step 0: key "1", field "name"). Row "1" key "name" needs to stay deleted, while row "1" key "user" is reanimated.
2). If you create and delete a column value rapidly enough that no flush takes place in the middle, there is no performance impact. The column will be updated in-place in the Memtable, and the tombstone simply discarded. Only a single value will end up being written persistently to an SSTable.
However, if the Memtable is flushed to disk between steps 2 and 3, then the tombstone will be written to the resulting SSTable. A subsequent flush will write the new value to the next SSTable. This will make subsequent reads slower, since the column now needs to be read from both SSTables and reconciled. (Similarly if a flush occurs between steps 1 and 2.)
只需将“日期”列设置为保存空字符串即可。这就是通常使用的内容而不是 null。
如果要删除该列,只需显式删除该列即可,而不是删除整行。这样做的性能效果类似于为列值写入空字符串。
Just set the "date" column to hold an empty string. That's what's typically used instead of null.
If you want to delete the column, just delete the column explicitly instead of deleting the entire row. The performance effect of this is similar to writing an empty string for the column value.