DISTINCT ON 如何在 GCP 数据存储区中工作
假设我有一种名为“audit”的类型,它具有以下条目:
获取 | 所有 | 我 |
---|---|---|
traceId | 为准 | , |
tenantIdtraceIdeventIdtenant1traceId1event1tenant1tra | 上 | ceId2event4 |
ceId2event3tenant1tra | 唯一 | ceId1event2tenant1tra |
的 | 需要 | 行 |
以第一个条目 因此查询应导致:
tenantId | traceId | eventIdtenant1tra |
---|---|---|
ceId1event1tenant1tra | 的 | 即使 |
ceId2event3 | 是 | 担心 |
对于上述内容,我使用selectdistincton(traceId)*fromaudit
这是一个简单的查询,我 随着我的条目的增长,此查询的性能。我将在数据存储区中拥有数十万个条目,但其中 50% 的条目在 TraceId 上可能是唯一的。
我读过数据存储不适用于聚合。所以,我的问题是:
- 是否被视为聚合查询?
- 索引扫描上的不同工作吗?
- 不同的会增加我的阅读成本吗?
- 内置索引会处理不同索引还是应该定义复合索引?
Let's say I have a kind named 'audit' and it has the following entries:
tenantId | traceId | eventId |
---|---|---|
tenant1 | traceId1 | event1 |
tenant1 | traceId1 | event2 |
tenant1 | traceId2 | event3 |
tenant1 | traceId2 | event4 |
I need to get all the rows unique on traceId whichever is the first entry
so the query should result in:
tenantId | traceId | eventId |
---|---|---|
tenant1 | traceId1 | event1 |
tenant1 | traceId2 | event3 |
For the above, I am using select distinct on(traceId) * from audit
Even though it's a simple query, my concern is the performance of this query as my entries grow. I will have hundreds of thousands of entries in the datastore, but out them, 50% might be unique on traceId.
I have read datastore is not for aggregations. So, my questions are:
- Is distinct on considered an aggregation query?
- Does distinct on work on index scan?
- Will distinct on increase my read cost?
- Will the built-in index handle the distinct on or should we define a composite index?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
distinct on
子句确保仅返回指定属性的每个不同值组合的第一个结果。所以它不被视为聚合查询。此外,Datastore 不支持聚合查询。您可以在本文档中了解相关内容
是的,
distinct on
适用于索引扫描,并且您不能将distinct on
应用于任何未索引的属性。如果您使用投影查询,则使用
distinct on< /code> 会增加成本,因为它会使查询脱离小操作,如上所述 这里。如果您不使用投影查询,那么它将根据实体读取进行收费。
如果您将
distinct on
应用到单个属性,即select unique on(traceId) * fromaudit
,那么它将使用在实体期间创建的内置索引创建。如果您将distinct on
应用于多个属性,即select unique on(traceId,eventId) * fromaudit
那么它将不适用于内置索引,您必须创建综合指数。distinct on
clause ensures that only the first result for each distinct combination of values for the specified properties will be returned. So it is not considered as an aggregation query. Also Datastore does not support aggregation queries.You can read about it in this document
Yes,
distinct on
works on index scan and you can not applydistinct on
to any of the unindexed properties.If you are using projection queries then using
distinct on
will increase the cost as it will make the query out of small operations as mentioned here. If you are not using projection queries then it will charge based on the entity reads.If you are applying
distinct on
to a single property i.e.select distinct on(traceId) * from audit
, then it will work with the built in index that is created during the entity creation. If you are applyingdistinct on
to multiple properties i.e.select distinct on(traceId,eventId) * from audit
then it will not work with the built in index and you have to create a composite index.