从桥接表中获取组密钥
我正在开发 ETL 流程,需要一个桥接表来实现事实表和维度表(MySQL 数据库)之间的一对多关系。组合的数量有限(大约数千),因此我想重新使用桥接表中的组键来限制大小。
属于事实行的任何维度组都将包含多个维度键(1 到大约 15),并分配给唯一的组键,如下所示:
group_key | dimension_key
-----------------------
1 | 1
1 | 3
1 | 4
2 | 1
2 | 2
2 | 3
3 | 1
3 | 4
如何检索维度 1,3 的唯一组键,4(即1)?
I'm developing an ETL process, and need a bridge table for a one-to-many relationship between a fact table and a dimension table (MySQL database). There is a limited number of combinations (some thousands), so I want to re-use group keys from the bridge table to to limit the size.
Any group of dimensions belonging to a fact row will consist of a number of dimension keys (1 to around 15), assigned to a unique group key, as below:
group_key | dimension_key
-----------------------
1 | 1
1 | 3
1 | 4
2 | 1
2 | 2
2 | 3
3 | 1
3 | 4
How do I go about retrieving the unique group key for the dimensions 1,3,4 (ie. 1)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为您要求一个返回组的查询,以便特定列表中的所有维度都与该组关联。也就是说,必须存在将该组映射到每个维度的行,并且您想知道哪些组满足这一点。
另一个解决方案是计算组中匹配的行数:
但我发现通常
GROUP BY
是性能杀手,尤其是在 MySQL 中。I think you're asking for a query that returns the groups such that all dimensions in a specific list are associated with the group. That is, rows must exist mapping that group to each of the dimensions, and you want to know which groups satisfy this.
The other solution is to count the matching rows in the group:
But I find that usually
GROUP BY
is a performance killer particularly in MySQL.如果我理解正确的话,你想要的是一个如下所示的桥接表:
我可以看到你有 2 个选项。
您可以将整个桥接表拉入程序中,并以编程方式从维度集中确定组键。
或者,您可以使用数学公式对维度键进行编码,以得出可以索引的整数。
类似于
a + (b * 32) + (c * 32 * 32) + ...
使用包含唯一维度数量的 2 的最低幂。If I understand you correctly, what you want is a bridge table that looks like this:
You have 2 options that I can see.
You can either pull the entire bridge table into a program, and programatically determine the group key from the dimension set.
Or you can encode the dimension key using a mathematical formula to come up with an integer than you can index.
Something like
a + (b * 32) + (c * 32 * 32) + ...
Use the lowest power of 2 that encompasses the number of unique dimensions.