如何设计交付数据的事实表
我正在构建一个数据仓库,其中包含餐馆的送货信息。 数据存储在 SQL Server 2005 中,然后放入 SQL Server Analysis Services 2005 多维数据集中。
交货信息由以下表格组成:
FactDeliveres
- BranchKey
- DeliveryDateKey
- ProductKey
- InvoiceNumber(DD:退化维度)
- Quantity
- UnitCosT
- Linecost
注意:
- FactDeliveres 的粒度是发票上的每一
- 行产品维度包括供应商信息
问题:事实表没有主键。 主键应该是唯一标识每个交付的东西加上 ProductKey。 但我无法唯一地识别交货。
在源OLTP数据库中,有一个DeliveryID,每次交付都是唯一的,但这是一个内部ID,对用户来说毫无意义。 InvoiceNumber 是供应商的发票号码——这是手动输入的,因此我们会得到重复的号码。
在多维数据集中,我仅基于 FactDeliveres 中的 InvoiceNumber 字段创建了一个维度。 这确实意味着,当您按 InvoiceNumber 进行分组时,您可能会合并 2 个交货,只是因为它们(错误地)具有相同的 InvoiceNumber。
我觉得我需要包含 DeliveryID(称为 DeliveryKey),但我不确定如何包含。
那么,我应该:
- 使用它作为 InvoiceNumber 维度的基础键吗?
- 创建一个每次有新交付时都会增长的 DimDelivery? 这可能意味着某些属性来自 FactDeliveries 并进入 DimDelivery,例如 DeliveryDate、Supplier、InvoiceNumber。
毕竟,我可以问您:当我的源数据库中有以下信息时,如何创建交付多维数据集
DeliveryHeaders
- DeliveryID (PK)
- DeliveryDate
- SupplyID (FK)
- InvoiceNumber(手动输入)
交货详细信息
- 交货 ID (PK)
- 产品 ID (PK)
- 数量
- UnitCosT
I'm building a data warehouse that includes delivery information for restaurants. The data is stored in SQL Server 2005 and is then put into a SQL Server Analysis Services 2005 cube.
The Deliveries information consists of the following tables:
FactDeliveres
- BranchKey
- DeliveryDateKey
- ProductKey
- InvoiceNumber (DD: degenerate dimension)
- Quantity
- UnitCosT
- Linecost
Note:
- The granularity of FactDeliveres is each line on the invoice
- The Product dimension include supplier information
And the problem: there is no primary key for the fact table. The primary key should be something that uniquely identifies each delivery plus the ProductKey. But I have no way to uniquely identify a delivery.
In the source OLTP database there is a DeliveryID that is unique for every delivery, but that is an internal ID that meaningless to users. The InvoiceNumber is the suppliers' invoices number -- this is typed in manually and so we get duplicates.
In the cube, I created a dimension based only on the InvoiceNumber field in FactDeliveres. That does mean that when you group by InvoiceNumber, you might get 2 deliveries combined only because they (mistakenly) have the same InvoiceNumber.
I feel that I need to include the DeliveryID (to be called DeliveryKey), but I'm not sure how.
So, do I:
- Use that as the underlying key for the InvoiceNumber dimension?
- Create a DimDelivery that grows every time there is a new delivery? That could mean that some attributes come out of FactDeliveries and go into DimDelivery, like DeliveryDate,Supplier, InvoiceNumber.
After all that, I could just ask you: how do I create a Deliveries cube when I have the following information in my source database
DeliveryHeaders
- DeliveryID (PK)
- DeliveryDate
- SupplierID (FK)
- InvoiceNumber (typed in manually)
DeliveryDetails
- DeliveryID (PK)
- ProductID (PK)
- Quantity
- UnitCosT
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
事实表 PK 几乎总是代理键。 每个事实都是多个维度的一部分,因此该事实对维度有 FK,但没有自己的真正键。
交货事实(行项目)属于一个分支,它有一个产品,它是更大交货的一部分,它发生在特定日期。 听起来像是 4 个独立的维度。
交货维度有其自己的 PK,并且具有发票编号的维度属性。 另外,也许还有整个交付的其他属性。
每个交货行项目事实都与一次交货以及该交货的发票号码相关联。
Fact table PK's are almost always surrogate keys. Each fact is part of several dimensions, so the fact has FK's to the dimensions, but no real keys of it's own.
A Delivery Fact (a Line Item) belongs to a Branch, it has a Product, it is part of a larger Delivery, it occurs on a particular Date. Sounds like 4 independent dimensions.
The Delivery dimension has it's own PK and it has a dimension attribute of invoice number. Plus, perhaps, other attributes of the delivery as a whole.
Each Delivery Line Item Fact is associated with one Delivery and the invoice number for that Delivery.
我会将 Quantity、UnitCode、InvoiceNumber、DeliveryID 全部包含在事实表中。 InvoiceNumber 和 DeliveryID 都是退化维度,因为它们会随着每个事实(或很少的事实)而变化。 如果每个订单上有大量商品,您可以将它们放入自己的维度中。 如果您的发票上有多次交货,下面的模型可能不是 100% 正确,但也很接近。 看看 Kimball,他可能有一个针对此业务场景的星型模式示例。
使用通常的日期维度表和以下维度:
永远记住,您的(星型模式)数据仓库的结构根本不会像您的 OLTP 数据一样 - 它完全与事实以及描述它们的维度有关。
I would have Quantity, UnitCode, InvoiceNumber, DeliveryID all in the fact table. Both InvoiceNumber and DeliveryID are degenerate dimensions, because they will change with every fact (or very few facts). It is possible that you could put them in their own dimension if you have a large number of items on each order. The model below may not be 100% correct if you have multiple deliveries on an invoice, but it will be close. Check out Kimball, he might have an example of a star schema for this business scenario.
with the usual date dimension table and the following dimensions:
Always remember, your (star schema) data warehouse is not going to be structured at all like your OLTP data - it's all about the facts and what dimensions describe them.