数据库设计和非数字主键的使用
我目前正在为客户和客户设计数据库表。 网站管理应用程序。 我的问题是关于使用主键作为表的功能部分(而不是仅仅因为为每个表分配“ID”号)。
例如,到目前为止,这是数据库中的四个相关表,其中一个使用传统的主键编号,其他表使用唯一名称作为主键:
--
-- website
--
CREATE TABLE IF NOT EXISTS `website` (
`name` varchar(126) NOT NULL,
`client_id` int(11) NOT NULL,
`date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
`notes` text NOT NULL,
`website_status` varchar(26) NOT NULL,
PRIMARY KEY (`name`),
KEY `client_id` (`client_id`),
KEY `website_status` (`website_status`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
--
-- website_status
--
CREATE TABLE IF NOT EXISTS `website_status` (
`name` varchar(26) NOT NULL,
PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `website_status` (`name`) VALUES
('demo'),
('disabled'),
('live'),
('purchased'),
('transfered');
--
-- client
--
CREATE TABLE IF NOT EXISTS `client` (
`id` int(11) NOT NULL auto_increment,
`date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
`client_status` varchar(26) NOT NULL,
`firstname` varchar(26) NOT NULL,
`lastname` varchar(46) NOT NULL,
`address` varchar(78) NOT NULL,
`city` varchar(56) NOT NULL,
`state` varchar(2) NOT NULL,
`zip` int(11) NOT NULL,
`country` varchar(3) NOT NULL,
`phone` text NOT NULL,
`email` varchar(78) NOT NULL,
`notes` text NOT NULL,
PRIMARY KEY (`id`),
KEY `client_status` (`client_status`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;
--
-- client_status
---
CREATE TABLE IF NOT EXISTS `client_status` (
`name` varchar(26) NOT NULL,
PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `client_status` (`name`) VALUES
('affiliate'),
('customer'),
('demo'),
('disabled'),
('reseller');
如您所见,4 个表中有 3 个使用其“名称”作为主键。 我知道这些永远是独一无二的。 在其中 2 种情况( *_status 表)中,我基本上使用 ENUM 的动态替换,因为状态选项将来可能会发生变化,对于“网站”表,我知道网站的“名称”将始终是独一无二的。
我想知道这是否是合理的逻辑,当我知道名称始终是唯一标识符时摆脱表 ID,或者是灾难的根源? 我不是一位经验丰富的 DBA,因此任何反馈、批评等都会非常有帮助。
感谢您抽时间阅读!
I'm currently in the process of designing the database tables for a customer & website management application. My question is in regards to the use of primary keys as functional parts of a table (and not assigning "ID" numbers to every table just because).
For example, here are four related tables from the database so far, one of which uses the traditional primary key number, the others which use unique names as the primary key:
--
-- website
--
CREATE TABLE IF NOT EXISTS `website` (
`name` varchar(126) NOT NULL,
`client_id` int(11) NOT NULL,
`date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
`notes` text NOT NULL,
`website_status` varchar(26) NOT NULL,
PRIMARY KEY (`name`),
KEY `client_id` (`client_id`),
KEY `website_status` (`website_status`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
--
-- website_status
--
CREATE TABLE IF NOT EXISTS `website_status` (
`name` varchar(26) NOT NULL,
PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `website_status` (`name`) VALUES
('demo'),
('disabled'),
('live'),
('purchased'),
('transfered');
--
-- client
--
CREATE TABLE IF NOT EXISTS `client` (
`id` int(11) NOT NULL auto_increment,
`date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
`client_status` varchar(26) NOT NULL,
`firstname` varchar(26) NOT NULL,
`lastname` varchar(46) NOT NULL,
`address` varchar(78) NOT NULL,
`city` varchar(56) NOT NULL,
`state` varchar(2) NOT NULL,
`zip` int(11) NOT NULL,
`country` varchar(3) NOT NULL,
`phone` text NOT NULL,
`email` varchar(78) NOT NULL,
`notes` text NOT NULL,
PRIMARY KEY (`id`),
KEY `client_status` (`client_status`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;
--
-- client_status
---
CREATE TABLE IF NOT EXISTS `client_status` (
`name` varchar(26) NOT NULL,
PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `client_status` (`name`) VALUES
('affiliate'),
('customer'),
('demo'),
('disabled'),
('reseller');
As you can see, 3 of the 4 tables use their 'name' as the primary key. I know that these will always be unique. In 2 of the cases (the *_status tables) I am basically using a dynamic replacement for ENUM, since status options could change in the future, and for the 'website' table, I know that the 'name' of the website will always be unique.
I'm wondering if this is sound logic, getting rid of table ID's when I know the name is always going to be a unique identifier, or a recipe for disaster? I'm not a seasoned DBA so any feedback, critique, etc. would be extremely helpful.
Thanks for taking the time to read this!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
你永远不知道你工作的公司什么时候会突然爆发式增长,你必须在一夜之间雇佣 5 名开发人员。 最好的选择是使用数字(整数)主键,因为整个团队使用它们会更容易,并且在数据库增长时将有助于提高性能。 如果您必须对记录进行分解并对它们进行分区,您可能需要使用主键。 如果您要添加带有日期时间戳的记录(每个表都应该如此),并且代码中的某处存在错误更新该字段的错误,则确认记录是否按正确顺序输入的唯一方法是检查主记录键。 使用 INT 主键可能还有 10 个以上的 TSQL 或调试原因,其中最重要的是编写一个简单的查询来选择输入到表中的最后 5 条记录。
You NEVER know when the company you work for suddenly explodes in growth and you have to hire 5 developers overnight. Your best bet is to use numeric (integer) primary keys because they will be much easier for the entire team to work with AND will help your performance if and when the database grows. If you have to break records out and partition them, you might want to use the primary key. If you are adding records with a datetime stamp (as every table should), and there is an error somewhere in the code that updates that field incorrectly, the only way to confirm if the record was entered in the proper sequence it to check the primary keys. There are probably 10 more TSQL or debugging reasons to use INT primary keys, not the least of which is writing a simple query to select the last 5 records entered into the table.
在决定表中的键之前应考虑以下几点
当您
使用引用(外键),因为
你不使用外键,没关系
您的情况使用非数字键。
非数字键使用的空间比
数字键,可以减少
性能。
明白(你可以很容易地知道不
只需查看最后一行即可获得行数)
Here few points should be considered before deciding keys in table
Numeric key is more suitable when you
use references ( foreign keys), since
you not using foreign keys, it ok in
your case to use non numeric key.
Non-numeric key uses more space than
numeric keys, can decrease
performance.
understand ( you can easily know no
of rows just by looking at last row)
我想我同意切杜阿尔多的观点。 距离我学习数据库设计课程已有 25 年了,但我记得有人告诉我,数据库引擎可以更有效地管理和加载使用字符键的索引。 关于当密钥更改时数据库必须更新数千条记录以及所有增加的空间被较长的密钥占用然后必须跨系统传输的评论假设密钥实际上存储在记录中并且无论如何,它不必跨系统传输。 如果您在表的列上创建索引,我认为该值不会存储在表的记录中(除非您设置了某些选项来这样做)。
如果您有一个表的自然键,即使它偶尔发生更改,创建另一个键也会产生冗余,从而可能导致数据完整性问题,并且实际上会创建更多需要跨系统存储和传输的信息。 我所在的团队决定将本地应用程序设置存储在数据库中。 它们的每个设置都有一个标识列、一个部分名称、一个键名称和一个键值。 他们有一个存储过程(另一个圣战)来保存设置,确保它不会出现两次。 我还没有找到使用设置 ID 的情况。 然而,我最终得到了具有相同部分和键名的多条记录,导致我的应用程序失败。 是的,我知道可以通过在列上定义约束来避免这种情况。
I think I am in agreement with cheduardo. It has been 25 years since I took a course in database design but I recall being told that database engines can more efficiently manage and load indexes that use character keys. The comments about the database having to update thousands of records when a key is changed and on all of the added space being taken up by the longer keys and then having to be transferred across systems, assumes that the key is actually stored in the records and that it does not have to be transferred across systems anyway. If you create an index on a column(s) of a table, I do not think the value is stored in the records of the table (unless you set some option to do so).
If you have a natural key for a table, even if it is changed occassionally, creating another key creates a redundancy that could result in data integrity issues and actually creates even more information that needs to be stored and transferred across systems. I work for a team that decided to store the local application settings in the database. They have an identity column for each setting, a section name, a key name, and a key value. They have a stored procedure (another holy war) to save a setting that ensures it does not appear twice. I have yet to find a case where I would use a setting's ID. I have, however, ended up with multiple records with the same section and key name that caused my application to fail. And yes, I know that could have been avoided by defining a constraint on the columns.
Kimberly Tripp 有一系列优秀的博客文章(GUID 作为主键和/或集群键 和聚集索引争论仍在继续)关于创建聚集索引和选择主键的问题(相关问题,但并不总是完全准确)相同)。 她的建议是,聚集索引/主键应该是:
使用“Name”作为密钥,虽然它似乎满足#1,但并不满足其他三个中的任何一个。
即使对于您的“查找”表,如果您的老板决定将所有附属改为合作伙伴怎么办? 您必须修改数据库中使用该值的所有行。
从性能角度来看,我最担心的可能是按键狭窄。 如果您的网站名称实际上是一个很长的 URL,那么这可能会导致任何非聚集索引以及所有使用它作为外键的表的大小增大。
Kimberly Tripp has an Excellent series of blog articles (GUIDs as PRIMARY KEYs and/or the clustering key and The Clustered Index Debate Continues) on the issue of creating clustered indexes, and choosing the primary key (related issues, but not always exactly the same). Her recommendation is that a clustered index/primary key should be:
Using "Name" as your key, while it seems to satisfy #1, doesn't satisfy ANY of the other three.
Even for your "lookup" table, what if your boss decides to change all affiliates to partners instead? You'll have to modify all rows in the database that use this value.
From a performance perspective, I'm probably most concerned that a key be narrow. If your website name is actually a long URL, then that could really bloat the size of any non-clustered indexes, and all tables that use it as a foreign key.
除了已经提出的所有其他要点之外,我还要补充一点,不要在 SQL Server 中使用大字段作为集群键(如果您不使用 SQL Server,那么这可能不适用于您) 。
我添加这个是因为在 SQL Server 中,默认情况下表上的主键也是集群键(如果您愿意并了解它,您可以更改它,但大多数情况下,这还没有完成)。
决定 SQL Server 表物理顺序的聚集键也被添加到该表上的每个非聚集索引中。 如果您只有几百到几千行和一两个索引,那没什么大不了的。 但是,如果您有包含数百万行的非常大的表,并且可能有大量索引来加速查询,这确实会导致大量磁盘空间和服务器内存不必要地浪费。
例如,如果您的表有 1000 万行、10 个非聚集索引,并且您的聚集键是 26 个字节而不是 4 个字节(对于 INT),那么您就浪费了 10 mio。 10 x 22 字节,总共 22 亿字节(或约 2.2 GB)——这不再是小事了!
再次强调 - 这仅适用于 SQL Server,并且仅当您有非常大的表且其中包含大量非聚集索引时。
马克
Besides all the other excellent points that have already been made, I would add one more word of caution against using large fields as clustering keys in SQL Server (if you're not using SQL Server, then this probably doesn't apply to you).
I add this because in SQL Server, the primary key on a table by default also is the clustering key (you can change that, if you want to and know about it, but most of the cases, it's not done).
The clustering key that determines the physical ordering of the SQL Server table is also being added to every single non-clustered index on that table. If you have only a few hundred to a few thousand rows and one or two indices, that's not a big deal. But if you have really large tables with millions of rows, and potentially lots of indices to speed up the queries, this will indeed cause a lot of disk space and server memory to be wasted unnecessarily.
E.g. if your table has 10 million rows, 10 non-clustered indices, and your clustering key is 26 bytes instead of 4 (for an INT), then you're wasting 10 mio. by 10 by 22 bytes for a total of 2.2 billion bytes (or 2.2 GBytes approx.) - that's not peanuts anymore!
Again - this only applies to SQL Server, and only if you have really large tables with lots of non-clustered indices on them.
Marc
我认为,一个能够抵抗损坏的数据库,即使它运行得慢一点,也比不抵抗损坏的数据库要好。
一般来说,代理键(例如任意数字标识符)会破坏数据库的完整性。 主键是识别数据库中行的主要方式; 如果主键值没有意义,则约束没有意义。 因此,任何引用代理主键的外键也是可疑的。 每当您必须检索、更新或删除单个行(并保证只影响一行)时,您必须使用主键(或另一个候选键); 当存在有意义的替代键时,必须计算出代理键值是什么,对于用户和应用程序来说是一个多余且潜在危险的步骤。
即使这意味着使用复合键来确保唯一性,我也主张尽可能使用一组有意义的、自然的属性作为主键。 如果无论如何都需要记录属性,为什么还要添加另一个属性呢? 也就是说,当没有自然、稳定、简洁、保证唯一的密钥(例如对于人来说)时,代理键就可以了。
如果您的 DBMS 支持,您还可以考虑使用索引键压缩。 这可能非常有效,特别是对于复合键上的索引(想想 trie 数据结构),并且特别是如果最不具有选择性的属性可以首先出现在索引中。
I would argue that a database that is resistant to corruption, even if it runs a little slower, is better than one that isn’t.
In general, surrogate keys (such as arbitrary numeric identifiers) undermine the integrity of the database. Primary keys are the main way of identifying rows in the database; if the primary key values are not meaningful, the constraint is not meaningful. Any foreign keys that refer to surrogate primary keys are therefore also suspect. Whenever you have to retrieve, update or delete individual rows (and be guaranteed of affecting only one), the primary key (or another candidate key) is what you must use; having to work out what a surrogate key value is when there is a meaningful alternative key is a redundant and potentially dangerous step for users and applications.
Even if it means using a composite key to ensure uniqueness, I would advocate using a meaningful, natural set of attributes as the primary key, whenever possible. If you need to record the attributes anyway, why add another one? That said, surrogate keys are fine when there is no natural, stable, concise, guaranteed-to-be-unique key (e.g. for people).
You could also consider using index key compression, if your DBMS supports it. This can be very effective, especially for indexes on composite keys (think trie data structures), and especially if the least selective attributes can appear first in the index.
这似乎是一个非常糟糕的主意。 如果需要更改枚举的值怎么办? 我们的想法是使其成为一个关系数据库而不是一组平面文件。 此时为什么会有client_status表呢? 此外,如果您在应用程序中使用数据,通过使用 GUID 或 INT 等类型,您可以验证类型并避免错误数据(就验证类型而言)。 因此,这是阻止黑客攻击的众多措施中的另一条。
This just seems like a really bad idea. What if you need to change the value of the enum? The idea is to make it a relational database and not a set of flat files. At this point, why have the client_status table? Moreover, if you are using the data in an application, by using a type like a GUID or INT, you can validate the type and avoid bad data (in so far as validating the type). Thus, it is another of many lines to deter hacking.
就我个人而言,我认为使用这个想法你会遇到麻烦。 当你最终建立更多的亲子关系时,当名字改变时,你最终会承担大量的工作(就像他们迟早会发生的那样)。 当网站名称发生更改时,必须更新具有数千行的子表,这可能会严重影响性能。 您必须计划如何确保这些变化发生。 否则,网站名称会发生更改(哎呀,我们让该名称过期,其他人买了它。)要么由于外键约束而中断,要么您需要采用自动方式(级联更新)来通过系统传播更改。 如果您使用级联更新,那么在处理大量更新时,您可能会突然使系统完全停止运行。 这不被认为是一件好事。 使用 id 进行关系确实更有效,然后在名称字段上放置唯一索引以确保它们保持唯一。 数据库设计需要考虑数据完整性的维护以及这将如何影响性能。
另一件需要考虑的事情是网站名称往往超过几个字符。 这意味着使用 id 字段进行连接和使用名称进行连接之间的性能差异可能非常显着。 你必须在设计阶段考虑这些事情,因为当你的生产系统有数百万条记录超时时,再更改 ID 已经太晚了,解决方法是完全重组数据库并重写所有 SQL代码。 十五分钟之内修复不了的问题就无法让网站恢复正常运行。
Personally, I think you will run into trouble using this idea. As you end up with more parent child relationships, you end up with a huge amount of work when the names change (As they always will sooner or later). There can be a big performance hit when having to update a child table that has thousands of rows when the name of the website changes. And you have to plan for how do make sure that those changes happen. Otherwise, the website name changes (oops we let the name expire and someone else bought it.) either break because of the foreign key constraint or you need to put in an automated way (cascade update) to propagate the change through the system. If you use cascading updates, then you can suddenly bring your system to a dead halt while a large chage is processed. This is not considered to be a good thing. It really is more effective and efficient to use ids for relationships and then put unique indexes on the name field to ensure they stay unique. Database design needs to consider maintenance of the data integrity and how that will affect performance.
Another thing to consider is that websitenames tend to be longer than a few characters. This means the performance difference between using an id field for joins and the name for joins could be quite significant. You have to think of these things at the design phase as it is too late to change to an ID when you have a production system with millions of records that is timing out and the fix is to completely restructure the databse and rewrite all of the SQL code. Not something you can fix in fifteen minutes to get the site working again.
“如果您绝对确定永远不会发生唯一性冲突,那么可以使用这些值作为主键。”
如果您绝对确定永远不会发生唯一性冲突,那么就不必定义密钥。
"If you're absolutely sure you will never ever have uniqueness violation, then it's OK to use these values as PRIMARY KEY's."
If you're absolutely sure you will never ever have uniqueness violation, then don't bother to define the key.
在制作自然
主键
时,请确保它们的唯一性在您的控制之下。如果您绝对确定永远不会发生唯一性冲突,那么可以将这些值用作
PRIMARY KEY
。由于
website_status
和client_status
似乎是由您生成和使用的,并且仅由您自己生成和使用,因此将它们用作PRIMARY KEY
是可以接受的,尽管长密钥可能会影响性能。website
名称似乎受到外部世界的控制,这就是为什么我将其设为一个普通字段。 如果他们想重命名他们的网站
怎么办?反例是
SSN
和ZIP
代码:它们不是由您生成的,并且不能保证它们不会被重复。When making natural
PRIMARY KEY
's, make sure their uniqueness is under your control.If you're absolutely sure you will never ever have uniqueness violation, then it's OK to use these values as
PRIMARY KEY
's.Since
website_status
andclient_status
seem to be generated and used by you and only by you, it's acceptable to use them as aPRIMARY KEY
, though having a long key may impact performance.website
name seems be under control of the outer world, that's why I'd make it a plain field. What if they want to rename theirwebsite
?The counterexamples would be
SSN
andZIP
codes: it's not you who generates them and there is no guarantee that they won't be ever duplicated.我总是向查找/枚举表添加 ID 号有两个原因:
在网站表中,如果您确信该名称是唯一的,那么可以将其用作主键。 就我个人而言,我仍然会分配一个数字 ID,因为它减少了外键表中使用的空间,而且我发现它更易于管理。
编辑:
如上所述,如果网站名称被重命名,您将会遇到问题。 通过将其设置为主键,即使不是不可能,日后对其进行更改也会变得非常困难。
There are 2 reasons I would always add an ID number to a lookup / ENUM table:
In the website table, if you are confident that the name will be unique then it is fine to use as a primary key. Personally I would still assign a numeric ID as it reduces the space used in foreign key tables and I find it easier to manage.
EDIT:
As stated above, you will run into problems if the website name is renamed. By making this the primary key you will be making it very difficult if not impossible for this to be changed at a later date.