您可以根据为每个列条目创建的时间戳值而不是列键对 Cassandra 列族上的 GET 进行排序吗?

发布于 2024-11-15 06:18:40 字数 356 浏览 1 评论 0原文

基本上我有一个“线程线”,其中创建新线程并使用 TimeUUID 作为键。这显然很容易提供新线程的排序,特别是在查询最新 20 个线程等时。

我的问题是,当对一个线程发出新的“帖子”时,我希望能够“碰撞”该线程到“线程线”的前面,这是问题所在,我基本上如何做到这一点,以便我仍然可以进行仍然可以按正确顺序选择的查询,而不提供任何类型的重复项等。

唯一的方法我可以看到这个工作是如果不是通过 TimeUUID 进行列族排序,我需要列族通过插入时间戳进行排序,因此我可以使用列键的唯一线程 ID 并按照它们插入或重新插入的顺序而不是按 TimeUUID 检索它们?这是可能的还是我错过了一个简单的技巧来实现这一点?据我所知,您必须设置一个特定的比较器,否则它默认为字节?

Basically I have a 'thread line' where new threads are made and a TimeUUID is used as a key. Which obviously provides sorting of a new thread quite easily, espically when say making a query of the latest 20 threads etc.

My problem is that when a new 'post' is made to a thread I want to be able to 'bump' that thread to the front of the 'thread line' which is where the problem comes in, how do I basically make this happen so I can still make queries that can still be selected in the right order without providing any kind of duplicates etc.

The only way I can see this working is if rather than a column family sorting via a TimeUUID I need the column family to sort via the insertion Timestamp, therefore I can use the unique thread IDs for column keys and retrieve these in the order they are inserted or reinserted rather than by TimeUUID? Is this possible or am I missing a simple trick that allows for this? As far as I know you have to set a particular comparitor or otherwise it defaults to bytes?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

江挽川 2024-11-22 06:18:40

行中的列始终使用给定的比较器按名称排序。您不能按时间戳、值或其他任何内容进行排序,否则 Cassandra 将无法正确地将多个更新合并到同一列。

至于你的用例,我可以想到两种选择。

与您现在所做的最相似的是创建第二个列族,ThreadMostRecentPosts,其中包含 timeuuid 列(您说的是“键”,但听起来您的意思是“列”)。当新帖子到达时,删除旧的最新列并添加新列。

这有两个问题:

  • 复制单位是行,因此无限期增长可能会出现问题。 (使用过期列来老化不再相关的线程信息可能会有所帮助。)
  • 您需要一个锁管理器,以便同一线程的多个帖子不会竞争,并且可能会在该行中留下多个条目。

我建议改为每天创建一行(例如),其列是线程 ID,其值是最新的帖子。添加新帖子只会更新该列中的值;没有删除/重新添加,因此比赛不再是问题。您不再可以免费进行排序,但这没关系,因为您将其限制为足够小的集合,以便您可以在内存中进行排序(例如,昨天的线程和今天的线程)。

(最后,我想补充一点,根据经验,我可以说,有一个截止点,旧线程不会被新回复撞到前面,这是一件好事。)

Columns within a row are always sorted by name with the given comparator. You cannot sort by timestamp or value or anything else, or Cassandra would not be able to merge multiple updates to the same column correctly.

As to your use case, I can think of two options.

The most similar to what you are doing now would be to create a second columnfamily, ThreadMostRecentPosts, with timeuuid columns (you said "keys" but it sounds like you mean "columns"). When a new post arrives, delete the old most-recent column and add a new one.

This has two problems:

  • The unit of replication is the row, so having this grow indefinitely could be problematic. (Using expiring columns to age out no-longer-relevant thread information might help.)
  • You need a lock manager so that multiple posts to the same thread don't race and possibly leave multiple entries in this row.

I would suggest instead creating a row per day (for instance), whose columns are the thread IDs and whose values are the most recent post. Adding a new post just updates the value in that column; no delete/re-add is done, so the race is not a problem anymore. You don't get sorting for free anymore but that's okay because you're limiting it to a small enough set that you can do that sort in memory (say, yesterday's threads and today's).

(Finally, I would add that I can say from experience that having a cutoff past which old threads don't get bumped to the front by a new reply is a Good Thing.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文