范围查询如何用于Cassandra的聚类密钥?

发布于 2025-02-07 03:28:18 字数 491 浏览 1 评论 0原文

根据官方文档: 聚类列在分区中订购数据。当表具有多个聚类列时,数据将以嵌套排序顺序存储。

假设我们有简单的时间表表:

CREATE TABLE alerts_by_year(
  year int,
  ts timestamp,
  alert text,
  PRIMARY KEY ((year), ts)
);

一个简单的查询,可以在一定范围内获取事件:

SELECT * FROM alerts_by_year
  WHERE year=2022
  AND ts >'2022-06-24 03:11:00'
  AND ts <'2022-06-24 04:11:00'

通过“ TS”群集键找到此范围的算法复杂性是什么? 是恒定时间还是O(n)时间? 它取决于所使用的存储类型:可记忆还是sstable?

那么它如何工作?我们是否只是通过“ TS”聚类键进行迭代,直到找到所需的范围?

According to the official doc:
Clustering columns order data within a partition. When a table has multiple clustering columns the data is stored in nested sort order.

Suppose we have simple timeseries table:

CREATE TABLE alerts_by_year(
  year int,
  ts timestamp,
  alert text,
  PRIMARY KEY ((year), ts)
);

A simple query that get events for some range:

SELECT * FROM alerts_by_year
  WHERE year=2022
  AND ts >'2022-06-24 03:11:00'
  AND ts <'2022-06-24 04:11:00'

What is algorithm complexity to find this range through the "ts" clustering keys?
Is it constant time or O(n) time?
Does it depends on the type of storage used: memtable or sstable?

How does it work then? Are we simply iterating through "ts" clustering keys until we find the required range?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

固执像三岁 2025-02-14 03:28:18

聚类列以排序顺序存储。如果您在创建表时没有明确指定订单,则聚类列将以升序排序。

在您的情况下,下表选项会自动添加到表的定义中:

WITH CLUSTERING ORDER BY (ts ASC)

在您的情况下,表格架构看起来像:

CREATE TABLE alerts_by_year(
  year int,
  ts timestamp,
  alert text,
  PRIMARY KEY ((year), ts)
) WITH CLUSTERING ORDER BY (ts ASC)

由于每个分区中的行按时间顺序从最旧到最新的时间戳进行排序,因此在ts 列是顺序完成的,一次迭代一行直到满足条件。

请注意,驱动程序将自动通过结果分页。例如, java驱动程序将返回第一个5000默认情况下行。然后,您的应用将需要检索“下一页”才能获取下一组行。干杯!

Clustering columns are stored in sorted order. If you don't explicitly specify the order when you create a table, the clustering columns will be sorted in ascending order.

In your case, the following table option is automatically added to your table's definition:

WITH CLUSTERING ORDER BY (ts ASC)

In your case, the table schema looks like:

CREATE TABLE alerts_by_year(
  year int,
  ts timestamp,
  alert text,
  PRIMARY KEY ((year), ts)
) WITH CLUSTERING ORDER BY (ts ASC)

Since the rows in each partition is sorted in chronological order from oldest to newest timestamp, a range query on the ts column is done sequentially, iterating one row at a time until the condition is satisfied.

Note that the drivers will automatically page through the results. For example, the Java driver will return the first 5000 rows by default. You app will then need to retrieve the "next page" to get the next set of rows. Cheers!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文