在CQL Count()限制的CQL Count()中超时

发布于 2025-01-24 10:43:38 字数 147 浏览 2 评论 0原文

我在Cassandra DB中有一张桌子。事实证明,它有大约10000个记录。当我尝试执行SELECT COUNT(*)时,我的查询时间会出来。令人惊讶的是,即使我用分区密钥限制查询,它也会有时。该表的列中有很多文本。我不明白这是一个问题,但我想,我想这是一个问题。有什么建议吗?

I have a table in cassandra DB that is populated. It provably has around 10000 records. When I try to execute select count(*), my query times out. Surprisingly, it times out even when i restrict the query with the partition key. The table has a column that is filled with a lot of text. I can't understand how that would be a problem, but i thought, i'd mention it. Any suggestions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

澉约 2025-01-31 10:43:38

除非包含成千上万的行,否则分区中的行中的行进行count(),除非它包含成千上万的行。更重要的是,当隔板包含数千个墓碑时,查询很可能是在计时。

您没有在问题中提供很多信息,理想情况下应该包括:

  • 如果您要存储类似队列的数据

并在处理后删除行(因为这是队列) )然后,您正在分区中生成很多墓碑。达到最大tombstone_failure_threshold(默认为100k墓碑)后,卡桑德拉(Cassandra)将停止阅读更多行。

不幸的是,很难没有必要的细节,很难说出您的情况。干杯!

Doing a COUNT() of the rows in a partition shouldn't timeout unless it contains thousands and thousands of rows. More importantly, the query is most likely timing out when the partition contains thousands of tombstones.

You haven't provided a lot of information in your question and ideally you should have included:

  • the table schema
  • a sample query

In any case if you are storing queue-like datasets and deleting rows after they've been processed (because it's a queue) then you are generating lots of tombstones within the partition. Once you've reached the maximum tombstone_failure_threshold (default is 100K tombstones) then Cassandra will stop reading any more rows.

Unfortunately, it's hard to say what's happening in your case without the necessary details. Cheers!

金兰素衣 2025-01-31 10:43:38

选择计数(*)需要扫描整个数据库,并且可能需要很长时间 - 比典型的超时时间更长。理想情况下,这样的查询将是 apaged - 定期返回空页面,直到最终页面包含计数为止 - 以避免时间。但是目前在卡桑德拉(Cassandra) - 以及Scylla-这还没有完成。

正如埃里克(Erick)在他的答复中指出的那样,如果您也有很多墓碑,一切都会变得更糟:您说您只有10,000行,但是很容易想象一个用例,可以经常更改数据,而实际上每行100已删除。行 - 因此,卡桑德拉需要扫描100万行(其中大多数已经死亡),而不是10,000行。

要考虑的另一个问题是,当您的群集非常大时,扫描通常会依次接触节点,并且每个节点多次(取决于vnodes的数量),因此,即使只有很大的扫描时间也很大,也会很大数据库中的几行。顺便说一句,与常规扫描不同,在并行中实际上可以在内部进行诸如计数(*)之类的聚合。 Scylla最近实施了此功能,并加快了计数(以及其他聚合),但是如果我理解正确的话,此功能不在Cassandra中。

最后,您说:“令人惊讶的是,即使我用分区密钥限制查询,它也会出现。”问题是 如何用分区键限制查询。如果将分区密钥本身限制在一个范围内,它仍然会很慢,因为Cassandra仍然需要扫描所有分区并将其键与范围进行比较。您应该做的是限制分区密钥的令牌,例如

where token(p) >= -9223372036854775808 and token(p) < ....

A SELECT COUNT(*) needs to scan the entire database and can potentially take an extremely long time - much longer than the typical timeout. Ideally, such a query would be paged - periodically returning empty pages until the final page contains the count - to avoid timing out. But currently in Cassandra - and also in Scylla - this isn't done.

As Erick noted in his reply, everything becomes worse if you also have a lot of tombstones: You said you only have 10,000 rows, but it's easy to imagine a use case where the data changes frequently, and you actually have for each row 100 deleted rows - so Cassandra needs to scan through 1 million rows (most of them already dead), not 10,000.

Another issue to consider is that when your cluster is very large, scanning usually contact nodes sequentially, and each node many times (depending on the number of vnodes), so the scan time on a very large cluster will be large even if there are just a few actual rows in the database. By the way, unlike a regular scan, an aggregation like COUNT(*) can actually be done internally in parallel. Scylla recently implemented this and it speeds up counts (and other aggregation), but if I understand correctly, this feature is not in Cassandra.

Finally, you said that "Surprisingly, it times out even when i restrict the query with the partition key.". The question is how you restricted the query with a partition key. If you restricted the partition key itself to a range, it will still be slow because Cassandra still needs to scan all the partitions and compare their keys to the range. What you should have done is to restrict the token of the partition key, e.g., something like

where token(p) >= -9223372036854775808 and token(p) < ....
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文