对或错:好的设计要求每个表都有一个主键,如果没有别的,就是一个连续的整数
考虑一个杂货店场景(我正在编造),其中您有代表销售交易的事实记录,其中事实表的列包括
SaleItemFact Table
------------------
CustomerID
ProductID
Price
DistributorID
DateOfSale
Etc
Etc
Etc
即使当您考虑所有键时表中存在重复项,我也会争辩应该组成一个代理运行数字键(即标识列),例如 Integer 类型的 TransactionNumber。
我看到有人争论事实表可能没有唯一键(尽管我发明了一个并浪费了 4 个字节,但是维度表怎么样?
Consider a grocery store scenario (I'm making this up) where you have FACT records that represent a sale transaction, where the columns of the Fact table include
SaleItemFact Table
------------------
CustomerID
ProductID
Price
DistributorID
DateOfSale
Etc
Etc
Etc
Even if there are duplicates in the table when you consider ALL the keys, I would contend that a surrogate running numeric key (i.e. identity column) should be made up, e.g., TransactionNumber of type Integer.
I can see someone arguing that a Fact table might not have a unique key (though I'd invent one and waste the 4 bytes, but how about a dimension table?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
第一范式要求每个表都有一个主键。所以这是良好数据库设计的最低要求。您选择的主键有很多争议。但数据库设计的第一范式却不是。
First normal form requires a primary key on every table. So this is the bare minimum required for good database design. What you choose for the primary key is open to much debate. But first normal form for database design is not.
每行拥有唯一键(从数据或其他方式构建)的众多原因之一是为了方便对该特定行的更新或删除。
无论如何,这个问题有点愚蠢,因为实际上不存在工程权衡。没有钥匙并没有真正的好处,那么有什么意义呢?是的/是的,行应该有唯一的标识符。
One reason, among many, to have a unique key per row (built from the data, or otherwise) is to facilitate updates or deletions to that specific row.
In any case, this question is kind of silly, because there really isn't an engineering trade-off at stake. There is no real proposed benefit to not having the key, so what's the point? True/yes, rows should have unique identifiers.
因为您的问题是在数据仓库下:
维度表应该有一个代理(无意义)主键,通常是一个自动递增整数;唯一标识表行描述的对象的业务键——例如电子邮件地址、全名或类似名称。
事实表大多数(几乎总是)有一个主键,它是两个或多个外键的组合。
事实
将外键组合到主键时,事实表中不应有重复项。要测试这一点,只需尝试加载同一事务两次 - 它应该会失败。自动生成的主键不会阻止这种情况,因为它不存在于仓库之外。通常可以通过将时间戳包含在主键中来解决该问题。
有时,事实表用作维度,或者在可充当维度的视图中使用。在这种情况下,使用一个(大)整数作为主键而不是多个 FK 字段会很方便——但是,FK 和时间戳的原始组合仍应唯一地标识事实行。
Because your question is under datawarehousing:
Dimension tables should have a surrogate (meaningless) primary key, usually an auto-increment integer; and a business key which uniquely identifies an object that the table row describes -- like an email address, full name or similar.
Fact tables mostly (almost always) have a primary key which is combination of two or more foreign keys.
There should be no duplicates in fact tables when combining foreign keys into the primary key. To test this, simply try to load the same transaction twice -- it should fail. An auto generated primary key will not prevent this, because it does not exists outside the warehouse. The problem can be usually solved by including the time-stamp into the primary key.
Sometimes a fact table is used as a dimension, or in a view that may act as a dimension. In this case it is convenient to have one (big)integer as a primary key, instead of several FK fields -- however, the original combination of FKs and time-stamp(s) should still uniquely identify the fact row.
对于数据仓库,事实表通常具有复合主键,通常是维度表的所有外键的组合。
在事实表中没有任何主键也是很常见的,因为它们通常除了浪费空间之外没有任何用途 - 对于大型数据仓库来说,空间可能非常大。不过,您的维度表将具有主键。
如果您谈论的是杂货店的 OLTP 部分,您通常会遵循标准 OLTP 数据库设计、规范化表并提供主键。
For data warehouses, fact tables often have a composite primary key, usually the composite of all the foreign keys to your dimension tables.
It's rather common to not have any primary key in your fact tables as well, as they often serve no purpose other than wasting space - and for large datawarehouses the space can be quite big. Your dimension tables will have primary keys though.
If you're talking about the OLTP part of your grocery store, you would normally follow standard OLTP database design, normalize your tables and provide a primary key.
真的。从概念上思考,一切都是唯一的,即使它不是由唯一数据定义的。因此,如果您将数据输入到表中并且它们具有完全相同的信息,则它们仍然是唯一的,因为它们被输入了两次。
除此之外,您还可以获得能够以相对较低的成本(4 字节)基于 id 轻松选择、更新、删除的好处。可以说,表越大,id 就越有用。因此,表越大,4 个字节就变得越来越少:-)
True. Think conceptually, everything is unique even if its not defined by unique data. So if you enter data into a table and they have the exact same information, they are still unique as they were entered twice.
Leaving that, you gain the benefit of being able to easily select, update, delete based on the id at a relatively low cost (4 bytes). Which arguably, the larger the table, the more useful the id is. So the 4 bytes becomes less and less of a point the larger the table :-)