从桥接表中获取组密钥

发布于 2024-09-04 22:07:20 字数 367 浏览 6 评论 0原文

我正在开发 ETL 流程,需要一个桥接表来实现事实表和维度表(MySQL 数据库)之间的一对多关系。组合的数量有限(大约数千),因此我想重新使用桥接表中的组键来限制大小。

属于事实行的任何维度组都将包含多个维度键(1 到大约 15),并分配给唯一的组键,如下所示:

group_key | dimension_key
-----------------------
1         | 1
1         | 3
1         | 4
2         | 1
2         | 2
2         | 3
3         | 1
3         | 4

如何检索维度 1,3 的唯一组键,4(即1)?

I'm developing an ETL process, and need a bridge table for a one-to-many relationship between a fact table and a dimension table (MySQL database). There is a limited number of combinations (some thousands), so I want to re-use group keys from the bridge table to to limit the size.

Any group of dimensions belonging to a fact row will consist of a number of dimension keys (1 to around 15), assigned to a unique group key, as below:

group_key | dimension_key
-----------------------
1         | 1
1         | 3
1         | 4
2         | 1
2         | 2
2         | 3
3         | 1
3         | 4

How do I go about retrieving the unique group key for the dimensions 1,3,4 (ie. 1)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

秋日私语 2024-09-11 22:07:20

我认为您要求一个返回组的查询,以便特定列表中的所有维度都与该组关联。也就是说,必须存在将该组映射到每个维度的行,并且您想知道哪些组满足这一点。

SELECT f1.group_key
FROM facts f1
JOIN facts f2 ON (f1.group_key = f2.group_key)
JOIN facts f2 ON (f1.group_key = f2.group_key)
WHERE f1.dimension_key = 1
  AND f2.dimension_key = 3
  AND f3.dimension_key = 4;

另一个解决方案是计算组中匹配的行数:

SELECT f.group_key
FROM facts f
WHERE f.dimension_key IN (1,3,4)
GROUP BY f.group_key
HAVING COUNT(*) = 3;

但我发现通常GROUP BY 是性能杀手,尤其是在 MySQL 中。

I think you're asking for a query that returns the groups such that all dimensions in a specific list are associated with the group. That is, rows must exist mapping that group to each of the dimensions, and you want to know which groups satisfy this.

SELECT f1.group_key
FROM facts f1
JOIN facts f2 ON (f1.group_key = f2.group_key)
JOIN facts f2 ON (f1.group_key = f2.group_key)
WHERE f1.dimension_key = 1
  AND f2.dimension_key = 3
  AND f3.dimension_key = 4;

The other solution is to count the matching rows in the group:

SELECT f.group_key
FROM facts f
WHERE f.dimension_key IN (1,3,4)
GROUP BY f.group_key
HAVING COUNT(*) = 3;

But I find that usually GROUP BY is a performance killer particularly in MySQL.

缺⑴份安定 2024-09-11 22:07:20

如果我理解正确的话,你想要的是一个如下所示的桥接表:

group_key | dimension_set
-----------------------
1         | (1, 3, 4)
2         | (1, 2, 3)
3         | (1, 4)

我可以看到你有 2 个选项。

您可以将整个桥接表拉入程序中,并以编程方式从维度集中确定组键。

或者,您可以使用数学公式对维度键进行编码,以得出可以索引的整数。

类似于 a + (b * 32) + (c * 32 * 32) + ... 使用包含唯一维度数量的 2 的最低幂。

If I understand you correctly, what you want is a bridge table that looks like this:

group_key | dimension_set
-----------------------
1         | (1, 3, 4)
2         | (1, 2, 3)
3         | (1, 4)

You have 2 options that I can see.

You can either pull the entire bridge table into a program, and programatically determine the group key from the dimension set.

Or you can encode the dimension key using a mathematical formula to come up with an integer than you can index.

Something like a + (b * 32) + (c * 32 * 32) + ... Use the lowest power of 2 that encompasses the number of unique dimensions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文