使用 Cassandra RandomPartitioner 进行有序数据
我想将大约 10 亿条数据存储在 Cassandra 中。数据项按时间排序,我要做的主要查询之一是按顺序查找两个时间范围之间的项目。如果可能的话,我真的更喜欢使用 RandomParititioner。卡桑德拉有办法做到这一点吗?
起初,由于我来自 SQL,我认为我应该将每个事件创建为一行,但后来我发现我的想法是错误的,我真的应该使用列。 Cassandra 中的列似乎是有序的,但我对它们的有序程度感到困惑。如果我使用时间作为列名,有没有办法让我按顺序从一个时间到另一个时间获取所有列?
我查看的另一件事是二级索引的 0.7 功能,但我很难找到有关是否可以使用这些功能按顺序查看一系列内容的文档。
我想要的只是这个 SQL 的 Cassandra 等效项:“Select * from Stuff where date > X and date < Y order by date asc”。我该怎么做?
I have about a billion pieces of data that I would like to store in Cassandra. The data items are ordered by time, and one of the main queries I'll be doing is to find the items between two time ranges, in order. I'd really prefer to use the RandomParititioner, if at all possible. Is there a way to do this in Cassandra?
At first, since I'm coming from SQL, I assumed I should create each event as a row, but then it occurred to me that I was thinking about it the wrong way and I should really use columns. Columns in Cassandra seem to be ordered, but I'm confused as to just how ordered they are. If I use a time as the column name, is there a way for me to get all of the columns from one time to another in order?
Another thing I looked at was the 0.7 feature of secondary indices, but I've had trouble finding documentation for whether I can use these to view a range of things in order.
All I want is the Cassandra equivalent of this SQL: "Select * from Stuff where date > X and date < Y order by date asc". How can I do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
分区器仅影响环周围键的分布,而不影响键内列的顺序。列始终根据为列族定义的列比较器进行排序。
您可以使用指定 SliceRange 的 SlicePredicate 调用 get_slice 来获取某个键中某个键的所有列范围。
要对数据进行建模,您可以为每天(或合适的时间分片)创建 1 行,并为每条数据创建一列。比如,
列名应该是二进制的,使用二进制比较器。前 8 个字节是时间戳,其余字节是数据的 id。
假设 X 和 Y 在同一天,要查找 X 和 Y 之间的所有项目,请在日期键上执行 get_slice,并使用 SlicePredicate 和 SliceRange 指定 X 的开始和 Y+1 的结束。 start 和 finish 都是 8 个字节的字节数组。
要查找多天的数据,请从多个键读取。
The partitioner only affects the distribution of keys around the ring, not the order of columns within a key. Columns are always ordered according to the Column Comparator defined for the column family.
You can call get_slice with a SlicePredicate that specifies a SliceRange to get all the columns of a key within a range.
To model your data, you can create 1 row for each day (or suitable time shard) and have a column for each piece of data. Something like,
The column names should be binary, using the binary comparator. The first 8 bytes are the timestamp, while the rest of the bytes are the id of the data.
Assuming X and Y are on the same day, to find all items between X and Y, do a do a get_slice on the day key, with a SlicePredicate with a SliceRange specifying a start of X and a finish of Y+1. Both start and finish are byte arrays of 8 bytes.
To find data over multiple days, read from multiple keys.