针对单一记录的 Unnest 查询优化

发布于 2025-01-16 16:10:50 字数 1415 浏览 3 评论 0原文

我正在尝试优化我的查询,以防止内部客户只想返回一个结果*(及其关联的嵌套数据集)。我的目标是减少查询过程的大小。

然而,无论我查询的是 1 条记录(未嵌套的 48,000 长度数组)还是整个数据集(10,000 条记录,未嵌套的数组总长度为 514,048,748),它似乎都是完全相同的值!

因此,我的表对一条记录查询的结果

SELECT test_id, value
FROM <my_table_reference>, unnest(TimeSeries)timeseries
WHERE test_id= "T0003" and SignalName = "Distance";

如下所示:

test_idvalue
T00031.0
T00032.0
T00033.0
T00034.0

(48000 rows)

这将一直持续到 1 条记录的 value (Distance) = 48000m (48000 rows): WHERE = ='T0003

总进程为3.84GB

对于整个表(约10,000条记录):

SELECT test_id, value
FROM <my_table_reference>, unnest(TimeSeries)timeseries
WHERE SignalName = "Distance";

如下所示:

test_idvalue
T00011.0
T00012.0
T00013.0
T00014.0

(514,048,748行)

总进程为3.84GB >

为什么两个查询的进程大小相同以及如何优化它奇异行提取?

I'm trying to optimise my query for when an internal customer only want to return one result *(and it's associated nested dataset). My aim is to reduce the query process size.

However, it appears to be the exact same value regardless of whether I'm querying for 1 record (with unnested 48,000 length array) or the whole dataset (10,000 records with unnest total 514,048,748 in total length of arrays)!

So my table results for one record query:

SELECT test_id, value
FROM <my_table_reference>, unnest(TimeSeries)timeseries
WHERE test_id= "T0003" and SignalName = "Distance";

looks like this:

test_idvalue
T00031.0
T00032.0
T00033.0
T00034.0

(48000 rows)

This will continue until value (Distance) = 48000m (48000 rows) for 1 record: WHERE == 'T0003.

Total process was 3.84GB

For whole table (~10,000 records):

SELECT test_id, value
FROM <my_table_reference>, unnest(TimeSeries)timeseries
WHERE SignalName = "Distance";

looks like this:

test_idvalue
T00011.0
T00012.0
T00013.0
T00014.0

(514,048,748 rows)

Total process was 3.84GB

Why are the process size the same for both queries and how can I optimise this for singular row extractions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

独夜无伴 2025-01-23 16:10:50

发生这种情况是因为仍然需要进行全表扫描来查找与指定 ID 相等的所有测试 ID。

从您的示例中不清楚哪些列是 timeseries 记录的一部分。如果 test_id 不是其中之一,我建议将表聚集在 test_id 列上。通过聚类,数据会根据test_id列的内容自动组织。

因此,当您使用该列上的过滤器进行查询时,不需要完整扫描来查找所有值。

此处详细了解集群表。

This is happening because there is still need for a full table scan to find all the test IDs that are equal to the specified one.

It is not clear from your example which columns are part of the timeseries record. In case test_id is not one of them, I would suggest to cluster the table on the test_id column. By clustering, the data will be automatically organized according to the contents of the test_id column.

So, when you query with a filter on that column a full scan won't be needed to find all values.

Read more about clustered tables here.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文