如果引用所有列,外键是否会减少冗余?
我读到规范化的好处之一是减少数据库中的冗余。但我想知道,您最终是否引用目标表中的所有列?
例如,如果我有一个引用流派表的视频表,则流派表很可能有一个列,其中包含十几个相当静态的值,例如“恐怖”“科幻”“浪漫”等。
在这样的情况下,它是否节省了分隔两者的空间,或者是唯一的好处,以便您可以从一个地方更新所有引用行?
I've read that one of the benefits of normalization is to reduce redundancy in the DB. But I'm wondering, if you end up referencing all the columns in the target table?
For example, if I have a Video table that references a Genre table, the Genre table might very likely have a single column with a dozen fairly static values like 'Horror' 'Sci-Fi' 'Romance' etc.
In a case like this, does it save any space to separate the two, or is the only benefit making it so you can update all referencing rows from one place?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
是的,节省空间是好处之一,而不是唯一的好处。
在您提到的情况下,不,如果您使用该一列作为 PK,则不会节省任何空间,这很好。
您可以使用自动编号/序列抽象该表并将其用作 PK,并使当前列成为候选键(因此它保持唯一)。
但是,让您的设计完全按照您所概述的那样进行,好处是保持一致性。您将只有这 12 个值...您不会意外输入“Horrer”或“PSY-Fi”的值
Right, space saving is ONE of the benefits, not the only one.
In the case you mentioned, no, you'll save no space if you use that one column as the PK which is fine.
You could abstract that table with a autonumber/sequence and use that as the PK, and make the current column the candidate key (so it stays unique).
But leaving your design exactly as you've outlined, the benefit is in consistency. You'll have only those 12 values... you'll not accidentally enter a value for "Horrer" or "PSY-Fi"
将两张桌子分开的好处之一就是节省空间。就像之前所说的那样,用 Genre_ID 代替实际值(例如“恐怖”或“冒险”)将节省空间。
在我看来,这样做更好的部分是为了加强诚信。如果您将文本值放入视频表中,什么可以防止您意外更改该值?现在有些行可能有“冒险”或“动作/冒险”等等。通过拥有 2 个表并使用外键进行引用,您将可以更好地控制哪些值可以成为流派。
总之,不必担心引用所有列,特别是当表的列很少时。如果您决定添加 ID 字段,或者只是将 1 列表保留为“可接受值”列表,您的目标应该是首先强制执行完整性,然后节省空间或 I/O 成本。
Saving space is one benefit to separating the 2 tables. Like it was said before, putting a Genre_ID in place of an actual value such as "Horror" or "Adventure" will save space.
In my opinion, the better part of doing this to to enforce integrity. If you put in the text values in the Video table, what prevents you from changing the value accidentally? Now some rows may have "Adventure" or "Action/Adventure" and so on. By having 2 tables and referencing with a foreign key, you're going to have better control over what values can be a genre.
In summary, don't worry about the fact that you reference all the columns, especially if a table has very few columns. If you decide to add an ID field, or just keep the 1 column table as a list of "acceptable values", your goal should be to enforce integrity first, and save space or I/O costs second.
我将使用代理键(自动编号、身份等)并将其用于外键连接而不是实际值。
这个想法更多的是关于数据质量而不是减少空间。
在大多数数据库中,INT 将小于 Varchar2 (20)
I would use surrogate keys (Autonumber, Identity, etc) and use that for the foreign key join instead of the actual value.
The idea is more about data quality than reducing space.
In most db's an INT will be smaller than Varchar2 (20)
是的,如果您有一个在视频表中使用的代理键 (int),而不是 varchar(20) 或任何 genra,它将节省空间。
Yes, it will save space if you have a surrogate key (int) which you use in the video table instead of the varchar(20) or whatever the genra would be.
但你自己也遇到了这个问题:
使用代理键和规范化表,“恐怖”只在数据库中存储一次,但它的 ID 号存储在多个位置(简单的数字大多数时候比文本小,并且确实节省空间)。它不仅提高了数据库的可维护性,而且确实节省了原始空间。
But you've hit the problem yourself there:
With surrogate keys and normalized tables, you only have "Horror" stored once in database, but its ID number is stored in several places (a simple number is smaller than the text most of the time, and does save space). Not only does it increase the maintainability of the database, but it does indeed save raw space.
如果您想确保视频表中的行具有有效/预定的类型条目,会发生什么情况?如果您没有外键约束,则需要视频表中该列的枚举,然后每次添加新流派时都必须更改架构,而不仅仅是向流派表添加新行。
What happens if you want to ensure that your rows in the Video table have valid/predetermined entries for Genre? If you don't have a foreign key constraint you would need an enum for that column in the Video table and then you would have to change the schema every time you add a new Genre instead of just adding a new row to a Genre table.
在这种情况下,您的键值加上它们的索引可能比数据本身大得多。执行此类简单代码的另一种模型是拥有一个代码表,然后使用插入和更新检查约束来验证它们。这也避免了为了获取流派数据而进行的连接。采取哪种方式是一种折腾,取决于您的应用程序查询的倾向。
In cases like that, your key values plus their indexes can be considerably larger than the data itself. Another model of doing simple codes like that is to have a table of codes and then an insert and update check constraint to validate them. That also avoids a join in order to get the genre data out. Which way you do it is kind of a toss up and would depend on what your application queries tend to be.
数据修改异常
如果您的另一张表(例如“书籍”)具有相同的流派,情况会变得更糟。
Data modification anomalies
It gets worse if you another table, say, "Books" that have the same Genres.
标准化与节省空间无关。它是为了消除由于某些类型的冗余而可能发生的潜在异常情况。由于规范化仅定义逻辑级别,因此规范化数据库很可能在物理上比非规范化或非规范化数据库更大或更小。
确实,规范化通常会使设计能够有效地转化为存储 - 但这实际上取决于 DBMS 的功能,而不是规范化中隐含的东西。
Normalization has nothing to do with saving space. It's about eliminating potential anomalies that can occur as a result of certain kinds of redundancy. Since normalization defines the logical level only it would be quite possible for a normalized database to be physically larger or physically smaller than a denormalized or un-normalized one.
It is true that normalization generally makes designs that ought to translate efficiently into storage - but that's really down to the features of the DBMS rather than something implicit in normalization.
您还可以保存,因为 'Horror' 在 Unicode 中占用 12 个字节,而 GenreId 可以是 Byte 或 char(1)。
You will save also because 'Horror' takes 12 bytes in Unicode, while GenreId can be a Byte or char(1).