DISTINCT ON 如何在 GCP 数据存储区中工作

发布于 2025-01-13 09:15:07 字数 1681 浏览 3 评论 0原文

假设我有一种名为“audit”的类型,它具有以下条目:

获取所有
traceId为准
tenantIdtraceIdeventIdtenant1tra​​ceId1event1tenant1tra​​ceId2event4
​​ceId2event3tenant1tra唯一​​ceId1event2tenant1tra
需要

以第一个条目 因此查询应导致:

tenantIdtraceIdeventIdtenant1tra
​​ceId1event1tenant1tra即使
​​ceId2event3担心

对于上述内容,我使用selectdistincton(traceId)*fromaudit

这是一个简单的查询,我 随着我的条目的增长,此查询的性能。我将在数据存储区中拥有数十万个条目,但其中 50% 的条目在 TraceId 上可能是唯一的。

我读过数据存储不适用于聚合。所以,我的问题是:

  1. 是否被视为聚合查询?
  2. 索引扫描上的不同工作吗?
  3. 不同的会增加我的阅读成本吗?
  4. 内置索引会处理不同索引还是应该定义复合索引?

Let's say I have a kind named 'audit' and it has the following entries:

tenantIdtraceIdeventId
tenant1traceId1event1
tenant1traceId1event2
tenant1traceId2event3
tenant1traceId2event4

I need to get all the rows unique on traceId whichever is the first entry
so the query should result in:

tenantIdtraceIdeventId
tenant1traceId1event1
tenant1traceId2event3

For the above, I am using select distinct on(traceId) * from audit

Even though it's a simple query, my concern is the performance of this query as my entries grow. I will have hundreds of thousands of entries in the datastore, but out them, 50% might be unique on traceId.

I have read datastore is not for aggregations. So, my questions are:

  1. Is distinct on considered an aggregation query?
  2. Does distinct on work on index scan?
  3. Will distinct on increase my read cost?
  4. Will the built-in index handle the distinct on or should we define a composite index?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

故事与诗 2025-01-20 09:15:07

是否被视为聚合查询?

distinct on 子句确保仅返回指定属性的每个不同值组合的第一个结果。所以它不被视为聚合查询。此外,Datastore 不支持聚合查询。

基于索引的查询机制支持广泛的查询,适合大多数应用。但是,它不支持其他数据库技术中常见的某些类型的查询:特别是,数据存储模式查询引擎不支持联接和聚合查询。

您可以在本文档中了解相关内容

distinct on 是否适用于索引扫描?

是的,distinct on 适用于索引扫描,并且您不能将 distinct on 应用于任何未索引的属性。

distinct on 会增加我的读取成本吗?

如果您使用投影查询,则使用 distinct on< /code> 会增加成本,因为它会使查询脱离小操作,如上所述 这里。如果您不使用投影查询,那么它将根据实体读取进行收费。

内置索引会处理不同索引还是我们应该定义复合索引?

如果您将 distinct on 应用到单个属性,即 select unique on(traceId) * fromaudit,那么它将使用在实体期间创建的内置索引创建。如果您将 distinct on 应用于多个属性,即 select unique on(traceId,eventId) * fromaudit 那么它将不适用于内置索引,您必须创建综合指数。

Is distinct on considered an aggregation query?

distinct on clause ensures that only the first result for each distinct combination of values for the specified properties will be returned. So it is not considered as an aggregation query. Also Datastore does not support aggregation queries.

The index-based query mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of query common in other database technologies: in particular, joins and aggregate queries aren't supported within the Datastore mode query engine.

You can read about it in this document

Does distinct on work on index scan?

Yes, distinct on works on index scan and you can not apply distinct on to any of the unindexed properties.

Will distinct on increase my read cost?

If you are using projection queries then using distinct on will increase the cost as it will make the query out of small operations as mentioned here. If you are not using projection queries then it will charge based on the entity reads.

Will the built-in index handle the distinct on or should we define a composite index?

If you are applying distinct on to a single property i.e. select distinct on(traceId) * from audit, then it will work with the built in index that is created during the entity creation. If you are applying distinct on to multiple properties i.e. select distinct on(traceId,eventId) * from audit then it will not work with the built in index and you have to create a composite index.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文