google-cloud-datastore google-cloud-platform

DISTINCT ON 如何在 GCP 数据存储区中工作

发布于 2025-01-13 09:15:07 字数 1681 浏览 3 评论 0原文

假设我有一种名为“audit”的类型，它具有以下条目：

获取	所有	我
traceId	为准	，
tenantIdtraceIdeventIdtenant1traceId1event1tenant1tra	上	ceId2event4
ceId2event3tenant1tra	唯一	ceId1event2tenant1tra
的	需要	行

以第一个条目因此查询应导致：

tenantId	traceId	eventIdtenant1tra
ceId1event1tenant1tra	的	即使
ceId2event3	是	担心

对于上述内容，我使用selectdistincton(traceId)*fromaudit

这是一个简单的查询，我随着我的条目的增长，此查询的性能。我将在数据存储区中拥有数十万个条目，但其中 50% 的条目在 TraceId 上可能是唯一的。

我读过数据存储不适用于聚合。所以，我的问题是：

是否被视为聚合查询？
索引扫描上的不同工作吗？
不同的会增加我的阅读成本吗？
内置索引会处理不同索引还是应该定义复合索引？

原文

Let's say I have a kind named 'audit' and it has the following entries:

tenantId	traceId	eventId
tenant1	traceId1	event1
tenant1	traceId1	event2
tenant1	traceId2	event3
tenant1	traceId2	event4

I need to get all the rows unique on traceId whichever is the first entry
so the query should result in:

tenantId	traceId	eventId
tenant1	traceId1	event1
tenant1	traceId2	event3

For the above, I am using select distinct on(traceId) * from audit

Even though it's a simple query, my concern is the performance of this query as my entries grow. I will have hundreds of thousands of entries in the datastore, but out them, 50% might be unique on traceId.

I have read datastore is not for aggregations. So, my questions are:

Is distinct on considered an aggregation query?
Does distinct on work on index scan?
Will distinct on increase my read cost?
Will the built-in index handle the distinct on or should we define a composite index?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

故事与诗 2025-01-20 09:15:07

是否被视为聚合查询？

distinct on 子句确保仅返回指定属性的每个不同值组合的第一个结果。所以它不被视为聚合查询。此外，Datastore 不支持聚合查询。

基于索引的查询机制支持广泛的查询，适合大多数应用。但是，它不支持其他数据库技术中常见的某些类型的查询：特别是，数据存储模式查询引擎不支持联接和聚合查询。

您可以在本文档中了解相关内容

distinct on 是否适用于索引扫描？

是的，distinct on 适用于索引扫描，并且您不能将 distinct on 应用于任何未索引的属性。

distinct on 会增加我的读取成本吗？

如果您使用投影查询，则使用 distinct on< /code> 会增加成本，因为它会使查询脱离小操作，如上所述这里。如果您不使用投影查询，那么它将根据实体读取进行收费。

内置索引会处理不同索引还是我们应该定义复合索引？

如果您将 distinct on 应用到单个属性，即 select unique on(traceId) * fromaudit，那么它将使用在实体期间创建的内置索引创建。如果您将 distinct on 应用于多个属性，即 select unique on(traceId,eventId) * fromaudit 那么它将不适用于内置索引，您必须创建综合指数。