数据仓库中的标记维度

发布于 2024-10-31 22:53:11 字数 425 浏览 9 评论 0原文

在我的数据仓库中，我有一个维度，其中每条记录可以有一个或多个标签。是否有任何直接的方法来对具有不同标签的维度进行建模？

我最初的想法是存储逗号分隔的列表，然后使用 MySQL:s FIND_IN_SET() 函数来测试标签是否存在。这允许我对数据进行切片。

但是，如果我希望能够按标签分组，我应该如何对我的方案进行建模？

示例：

两种产品：产品 A 和产品 B。A 标记为“foo”、“bar”。 B 被标记为“bar”、“baz”。

查询：销售额，按产品标签分组。标签“bar”的组必须包含产品 A 和 B 的销售额：

foo -> sales for A
bar -> sales for B and A
baz -> sales for C

原文

In my data warehouse I got a dimension where each record can have one or several tags.
Are there any straight forward ways to model a dimension with different tags?

My initial idea was to store the comma separated list, and then use MySQL:s FIND_IN_SET() function to test for the presence of a tag. This allows me to slice the data.

But how should I model my scheme if I want to be able to group by tag?

Example:

Two products: Product A and product B. A is tagged with "foo", "bar". B is tagged with "bar", "baz".

Query: sales amounts, grouped by product tag. Group for tag "bar" must include sales for both products A and B:

foo -> sales for A
bar -> sales for B and A
baz -> sales for C

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

焚却相思 2024-11-07 22:53:11

例如，您可以存储按标签重复的数据。

如果您的销售额 a=10 foo + bar 和 b=20 bar + baz，您可以像这样存储它。

    sales_by_tag_facts
    id, tag, sale_id, amount, is_primary_record
    1   foo  A        10     true
    2   bar  A        10     false
    3   bar  B        20     true
    4   baz  B        20     false

select sum(amount) from sales_by_tag_facts group by tag; // by tag
select sum(amount) from sales_by_tag_facts where is_primary_record=true; // without tag.

For example you could store your data duplicated by tag.

If you have sales a=10 foo + bar and b=20 bar + baz you could store it like this.

    sales_by_tag_facts
    id, tag, sale_id, amount, is_primary_record
    1   foo  A        10     true
    2   bar  A        10     false
    3   bar  B        20     true
    4   baz  B        20     false

select sum(amount) from sales_by_tag_facts group by tag; // by tag
select sum(amount) from sales_by_tag_facts where is_primary_record=true; // without tag.

回复收藏 0 原文

臻嫒无言 2024-11-07 22:53:11

为什么不将繁重的工作从报告转移到交易预订

您可以添加一个名为：

TagTotal 的新表，该表存储每个标签的总金额，并通过 BEFORE(AFTER)_UPDATE_EACH 触发器更新每笔交易。

额外字段/表

如果产品表中有 2 个额外字段：

product.amount    decimal(10,2) running total of sales to date
product.last_sale date          date of the last sale

标签总计表如下所示。

tag.id primary autoinc    
tag.tagstr varchar(25)     
tag.amount decimal(10,2)
tag.date_from date   #dates to keep the running totals per month/week/day.
tag.date_to   date

伪代码

CREATE TRIGGER ai_sales_each AFTER INSERT ON sales FOR EACH ROW
BEGIN
  UPDATE product SET product.amount = product.amount + new.amount,
    product.last_sale = sale.date
END

CREATE TRIGGER au_product_each AFTER UPDATE ON product FOR EACH ROW
BEGIN
  DECLARE AllTags VARCHAR(255);
  DECLARE ThisTag VARCHAR(25);

  IF old.tags <> new.tags THEN BEGIN
    reorganize the running totals in the tagtotal table.
  END; END IF;

  SET AllTags = new.tags;
  WHILE AllTags has more tags BEGIN
    SET ThisTag = NextTag(AllTags);
    UPDATE TagTotals SET amount = amount + new.amount
      WHERE TagTotals.tagstr = ThisTag 
      AND new.last_date BETWEEN TagTotals.date_from AND TagTotals.date_to;
  END; END WHILE;

现在，如果您想要每个标签的销售总额，您只需从 tagtotals 表中进行选择即可。
这将立即产生结果。

Why not shift the heavy lifting from reporting to the transaction booking

You can add a new table called:

TagTotal that stores the total amount per tag and updates with with every transaction via a BEFORE(AFTER)_UPDATE_EACH trigger.

Extra fields/table

If you have 2 extra fields in the product table:

product.amount    decimal(10,2) running total of sales to date
product.last_sale date          date of the last sale

And the tag totals table looks like this.

tag.id primary autoinc    
tag.tagstr varchar(25)     
tag.amount decimal(10,2)
tag.date_from date   #dates to keep the running totals per month/week/day.
tag.date_to   date

pseudo code

CREATE TRIGGER ai_sales_each AFTER INSERT ON sales FOR EACH ROW
BEGIN
  UPDATE product SET product.amount = product.amount + new.amount,
    product.last_sale = sale.date
END

CREATE TRIGGER au_product_each AFTER UPDATE ON product FOR EACH ROW
BEGIN
  DECLARE AllTags VARCHAR(255);
  DECLARE ThisTag VARCHAR(25);

  IF old.tags <> new.tags THEN BEGIN
    reorganize the running totals in the tagtotal table.
  END; END IF;

  SET AllTags = new.tags;
  WHILE AllTags has more tags BEGIN
    SET ThisTag = NextTag(AllTags);
    UPDATE TagTotals SET amount = amount + new.amount
      WHERE TagTotals.tagstr = ThisTag 
      AND new.last_date BETWEEN TagTotals.date_from AND TagTotals.date_to;
  END; END WHILE;

Now if you want the sales totals per tag, you can just select from the tagtotals table.
Which will give instant results.

回复收藏 0 原文

ペ泪落弦音 2024-11-07 22:53:11

我建议反对，它违反了规范化规则。
请参阅我一直搞砸 1NF
或者阅读标准化标签下的帖子。

重新设计表格的建议

如果您像这样制作标签和标签链接表格。

table tag
  id autoincrement integer primary index 
  tag_str varchar(20) index unique not null

table taglink
  id autoincrement integer primary index #gotta have an ID in my book.
  tag_id integer not null
  product_id integer not null

你有一个类似这样的销售表。

table product
  id autoincement integer primary index
  desc varchar(255)
  barcode, price, whatever ...

选择语句来查找每个标签的产品

您可以按如下方式查找与标签匹配的文章。

select * from product
inner join taglink on (product.id = taglink.product_id)
inner join tag on (taglink.tag_id = tag.id)
where tag.tag_str in ('foo','bar','baz');

选择语句列出每个产品的标签

select tag_str from tag
inner join taglink on (taglink.tag_id = tag.id)
inner join product on (taglink.product_id = product.id)
where product.barcode = '4548215' or product.desc like 'OMG Po%'

添加新标签

要添加新标签，只需

insert into tag (id, tag_str) values (
  null /*remember autoincrement*/
  ,'Mytag');

链接标签

将标签链接到产品

set @product_id = 10;
set @tag_id = 1;
...or...
select @product_id:= product.id from product where product.barcode = '1254851';
...
insert into taglink (id, product_id, tag_id) values (
  null /*autoinc id*/
  ,@product_id
  ,@tag_id );

您可以将无限数量的标签链接到产品，并且您不会因为成本高昂的 FIND_IN_SET 语句而减慢查询速度。
并且可以防止重复的标签。
而且您的数据库将会更快、更小。

I would recommend against that, it goes against the normalization rules.
See I keep messing up 1NF
Or read up on the posts under the normalization tag.

Redesign suggestion for tables

If you make a tag and taglink table like so.

table tag
  id autoincrement integer primary index 
  tag_str varchar(20) index unique not null

table taglink
  id autoincrement integer primary index #gotta have an ID in my book.
  tag_id integer not null
  product_id integer not null

and you have a sales table something like this.

table product
  id autoincement integer primary index
  desc varchar(255)
  barcode, price, whatever ...

select statement to find products per tag

Than you can lookup articles that match a tag as follows.

select * from product
inner join taglink on (product.id = taglink.product_id)
inner join tag on (taglink.tag_id = tag.id)
where tag.tag_str in ('foo','bar','baz');

select statement to list tags per product

select tag_str from tag
inner join taglink on (taglink.tag_id = tag.id)
inner join product on (taglink.product_id = product.id)
where product.barcode = '4548215' or product.desc like 'OMG Po%'

Adding new tags

To add a new tag, just

insert into tag (id, tag_str) values (
  null /*remember autoincrement*/
  ,'Mytag');

linking a tag

To link a tag to a product

set @product_id = 10;
set @tag_id = 1;
...or...
select @product_id:= product.id from product where product.barcode = '1254851';
...
insert into taglink (id, product_id, tag_id) values (
  null /*autoinc id*/
  ,@product_id
  ,@tag_id );

You can link an unlimited number of tags to a product and you don't slow your queries down with costly FIND_IN_SET statements.
And you prevent duplicate tags.
And your database will be faster and smaller.

回复收藏 0 原文

~没有更多了~