使用 Cassandra RandomPartitioner 进行有序数据

发布于 2024-11-26 05:09:15 字数 441 浏览 2 评论 0原文

我想将大约 10 亿条数据存储在 Cassandra 中。数据项按时间排序，我要做的主要查询之一是按顺序查找两个时间范围之间的项目。如果可能的话，我真的更喜欢使用 RandomParititioner。卡桑德拉有办法做到这一点吗？

起初，由于我来自 SQL，我认为我应该将每个事件创建为一行，但后来我发现我的想法是错误的，我真的应该使用列。 Cassandra 中的列似乎是有序的，但我对它们的有序程度感到困惑。如果我使用时间作为列名，有没有办法让我按顺序从一个时间到另一个时间获取所有列？

我查看的另一件事是二级索引的 0.7 功能，但我很难找到有关是否可以使用这些功能按顺序查看一系列内容的文档。

我想要的只是这个 SQL 的 Cassandra 等效项：“Select * from Stuff where date > X and date < Y order by date asc”。我该怎么做？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

趁微风不噪 2024-12-03 05:09:15

分区器仅影响环周围键的分布，而不影响键内列的顺序。列始终根据为列族定义的列比较器进行排序。

您可以使用指定 SliceRange 的 SlicePredicate 调用 get_slice 来获取某个键中某个键的所有列范围。

要对数据进行建模，您可以为每天（或合适的时间分片）创建 1 行，并为每条数据创建一列。比如，

"yyyy-mm-dd" : {  #key, one for each day
    timeStampMillis1:dataid1 : "value1" # one column for each piece of data
    timeStampMillis2:dataid2 : "value2" 
    timeStampMillis3:dataid3 : "value3" 
}

列名应该是二进制的，使用二进制比较器。前 8 个字节是时间戳，其余字节是数据的 id。

假设 X 和 Y 在同一天，要查找 X 和 Y 之间的所有项目，请在日期键上执行 get_slice，并使用 SlicePredicate 和 SliceRange 指定 X 的开始和 Y+1 的结束。 start 和 finish 都是 8 个字节的字节数组。

要查找多天的数据，请从多个键读取。

The partitioner only affects the distribution of keys around the ring, not the order of columns within a key. Columns are always ordered according to the Column Comparator defined for the column family.

You can call get_slice with a SlicePredicate that specifies a SliceRange to get all the columns of a key within a range.

To model your data, you can create 1 row for each day (or suitable time shard) and have a column for each piece of data. Something like,

"yyyy-mm-dd" : {  #key, one for each day
    timeStampMillis1:dataid1 : "value1" # one column for each piece of data
    timeStampMillis2:dataid2 : "value2" 
    timeStampMillis3:dataid3 : "value3" 
}

The column names should be binary, using the binary comparator. The first 8 bytes are the timestamp, while the rest of the bytes are the id of the data.

Assuming X and Y are on the same day, to find all items between X and Y, do a do a get_slice on the day key, with a SlicePredicate with a SliceRange specifying a start of X and a finish of Y+1. Both start and finish are byte arrays of 8 bytes.

To find data over multiple days, read from multiple keys.

回复收藏 0 原文

~没有更多了~