数据仓库中的标记维度
在我的数据仓库中,我有一个维度,其中每条记录可以有一个或多个标签。 是否有任何直接的方法来对具有不同标签的维度进行建模?
我最初的想法是存储逗号分隔的列表,然后使用 MySQL:s FIND_IN_SET() 函数来测试标签是否存在。这允许我对数据进行切片。
但是,如果我希望能够按标签分组,我应该如何对我的方案进行建模?
示例:
两种产品:产品 A 和产品 B。A 标记为“foo”、“bar”。 B 被标记为“bar”、“baz”。
查询:销售额,按产品标签分组。标签“bar”的组必须包含产品 A 和 B 的销售额:
foo -> sales for A
bar -> sales for B and A
baz -> sales for C
In my data warehouse I got a dimension where each record can have one or several tags.
Are there any straight forward ways to model a dimension with different tags?
My initial idea was to store the comma separated list, and then use MySQL:s FIND_IN_SET() function to test for the presence of a tag. This allows me to slice the data.
But how should I model my scheme if I want to be able to group by tag?
Example:
Two products: Product A and product B. A is tagged with "foo", "bar". B is tagged with "bar", "baz".
Query: sales amounts, grouped by product tag. Group for tag "bar" must include sales for both products A and B:
foo -> sales for A
bar -> sales for B and A
baz -> sales for C
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
例如,您可以存储按标签重复的数据。
如果您的销售额 a=10 foo + bar 和 b=20 bar + baz,您可以像这样存储它。
For example you could store your data duplicated by tag.
If you have sales a=10 foo + bar and b=20 bar + baz you could store it like this.
为什么不将繁重的工作从报告转移到交易预订
您可以添加一个名为:
TagTotal 的新表,该表存储每个标签的总金额,并通过 BEFORE(AFTER)_UPDATE_EACH 触发器更新每笔交易。
额外字段/表
如果产品表中有 2 个额外字段:
标签总计表如下所示。
伪代码
现在,如果您想要每个标签的销售总额,您只需从 tagtotals 表中进行选择即可。
这将立即产生结果。
Why not shift the heavy lifting from reporting to the transaction booking
You can add a new table called:
TagTotal that stores the total amount per tag and updates with with every transaction via a BEFORE(AFTER)_UPDATE_EACH trigger.
Extra fields/table
If you have 2 extra fields in the product table:
And the tag totals table looks like this.
pseudo code
Now if you want the sales totals per tag, you can just select from the tagtotals table.
Which will give instant results.
我建议反对,它违反了规范化规则。
请参阅我一直搞砸 1NF
或者阅读标准化标签下的帖子。
重新设计表格的建议
如果您像这样制作标签和标签链接表格。
你有一个类似这样的销售表。
选择语句来查找每个标签的产品
您可以按如下方式查找与标签匹配的文章。
选择语句列出每个产品的标签
添加新标签
要添加新标签,只需
链接标签
将标签链接到产品
您可以将无限数量的标签链接到产品,并且您不会因为成本高昂的
FIND_IN_SET
语句而减慢查询速度。并且可以防止重复的标签。
而且您的数据库将会更快、更小。
I would recommend against that, it goes against the normalization rules.
See I keep messing up 1NF
Or read up on the posts under the normalization tag.
Redesign suggestion for tables
If you make a tag and taglink table like so.
and you have a sales table something like this.
select statement to find products per tag
Than you can lookup articles that match a tag as follows.
select statement to list tags per product
Adding new tags
To add a new tag, just
linking a tag
To link a tag to a product
You can link an unlimited number of tags to a product and you don't slow your queries down with costly
FIND_IN_SET
statements.And you prevent duplicate tags.
And your database will be faster and smaller.