用户的唯一标识符
如果我有一个包含一百个用户的表,通常我只会设置一个自动递增的 userID 列作为主键。但是,如果突然我们有 100 万或 500 万用户,那么这就变得非常困难,因为我希望开始变得更加分布式,在这种情况下,自动增量主键将毫无用处,因为每个节点都会创建相同的主键。
解决这个问题的方法是使用自然主键吗?我真的很难为这群用户考虑一个自然的主键。问题是他们都是年轻人,所以他们没有国家保险号码或我能想到的任何其他唯一标识符。我可以创建一个多列主键,但仍然有机会,但重复发生的次数很少。
有谁知道解决方案吗?
谢谢
If I have a table of a hundred users normally I would just set up an auto-increment userID column as the primary key. But if suddenly we have a million users or 5 million users then that becomes really difficult because I would want to start becoming more distributed in which case an auto-increment primary key would be useless as each node would be creating the same primary keys.
Is the solution to this to use natural primary keys? I am having a real hard time thinking of a natural primary key for this bunch of users. The problem is they are all young people so they do not have national insurance numbers or any other unique identifier I can think of. I could create a multi-column primary key but there is still a chance, however miniscule of duplicates occurring.
Does anyone know of a solution?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
我想说的是,暂时保留用户 ID 的自动增量。
当你确实突然涌入数百万用户时,你就可以考虑改变它了。
换句话说,遇到问题就解决它。 “过早的优化是万恶之源”。
要回答这个问题 - 一些自动增量将允许您播种自动增量,因此您可以在不同的节点上获得不同的自动增量。这将避免该问题,同时仍然允许使用自动增量。
I would say that for the time being keep an auto increment for the user ID.
When you do have that sudden rush of millions of users, then you can think about changing it.
In other words, solve the problem when you have it. "premature optimization is the root of all evil.".
To answer the question - some auto increments will allow you to seed the auto increment, so you can get different auto increments on the different nodes. This will avoid the problem, while still allowing use of an auto increment.
这里的标准解决方案是使用 GUID。不过,它们在索引方面的表现不会那么好。
The standard solution here is to use a GUID. They won't perform as well in terms of indexing, though.
GUID 很好,但容易发生冲突(尽管很少见)。
这可能是一个非标准的解决方案,但我要把它扔在那里:
您可以使用自动递增数字,但将来根据分布隔离数字空间。
假设您有 3 台服务器。记录 ID 如下:
服务器 1:0 - 9,999,999
服务器 2:10,000,000 - 19,999,999
服务器 3:20,000,000 - 29,999,999
即使在 32 位 int 的限制内,也应该留下足够的扩展空间(如果您担心的话,甚至可以使用 100,000,000 的间隙),并且它本质上保证了整个系统的唯一性。
GUIDs are good, but are subject to collision (albeit rare).
This might be a nonstandard solution, but I'm gonna throw it out there:
You can use auto-incrementing numbers, but segregate numberspace according to distribution in the future.
So let's say you have 3 servers. Record the IDs as follows:
Server 1: 0 - 9,999,999
Server 2: 10,000,000 - 19,999,999
Server 3: 20,000,000 - 29,999,999
Even within the constraints of a 32-bit int, that should leave plenty of expansion space (could even use gaps of 100,000,000 if you're worried), and it essentially guarantees uniqueness across the system.
如果您需要数百万个 ID 并且有许多节点,请将主键设为以下内容的组合:
这比 GUID 更好(更小,使用更少的内存,并且速度更快)
if you need millions of IDs and have many nodes, make the primary key a composite of:
which is way better than a GUID (smaller, uses less memory, and will be faster)
永远不要使用自然主键,除非您想要糟糕的性能和潜在的错误数据。很少有自然键不会随着时间的推移而改变,尤其是名称。如果自然键发生变化,则所有相关子记录也必须发生变化。这显然很糟糕。
您可以使用 GUID。但 500 万对于数据来说不算什么,很可能不需要改变。我们的系统中有超过 10,000,000 名不同的人,而且我们只有一个中等规模的数据库,没有分区或不需要 GUID。
Never use natural primary keys unless you want bad performance and the potential for bad data. There are very few natural keys that are nto subject to change over time especially names. If a natural key changes, then all related child records must also change. This is clearly bad.
You could use GUIDS. But 5 million is nothing in terms of data and likely would not require a change. We have over 10,000,000 different people in our system and we only have a medium sized database with no partioning or need for GUIDs.
GUID 是一种简单的解决方法,但是......
它需要如何分布?如果数据库数量有限,您可以为每个数据库指定一个要使用的数字范围。例如,第一个数据库自动生成 0 到 999,999 范围内的数字,下一个数据库使用 1,000,000 到 1,999,999。这样他们就可以各自生成一个用户 ID,而不会互相碰撞。如果数据库包含识别它的唯一编号,则可以根据该编号自动生成范围。
我认为您不能使用自动增量列来执行此操作,但存储过程可以以这种方式生成数字。
A GUID is an easy way out but...
How distributed does it need to be? If it is a limited number of databases you can give each database a range of numbers to use. So for example the first database auto generates numbers in the range of 0 to 999,999 and the next uses 1,000,000 to 1,999,999. That way they can each generate a user id without bumping into each other. If the database includes a unique number identifing it then the ranges can be generated automatically from this number.
I don't think you can use an auto-increment column to do this but a stored procedure could generate numbers in this manner.
当聚集时,GUID 作为键是垃圾。如果非聚集,您仍然需要另一列上的聚集索引。
使用整数键,并为每个
新
节点/站点如果您确实有节点/站点,则带有 SiteID 的第二列也将起作用。
GUIDs are rubbish as keys when clustered. If non-clustered, you'll still need a clustered index on another column.
Use an integer key and for each
new
node/siteIf you do have nodes/sites then a second column with SiteID will work too.
如果您使用 MSSQL,则可以将表的 PK 创建为 UNIQUEIDENTIFIER,并将默认值或绑定设置为 NEWID()。
If you're using MSSQL, you can create the PK of your table as UNIQUEIDENTIFIER and set the Default Value or Binding to NEWID().
我建议你永远不要考虑 GUID,原因之一是目前我遇到了麻烦,假设如果你有数百万用户,那么你可能需要更大程度的并发性,而 Guid 会在插入和删除时毁掉你的生活,因为你将有一个索引在它们上,默认情况下它将是一个聚集索引,这意味着当您有一个聚集索引时,每次插入和删除都会物理移动记录,而且Guids不是连续的,因此每个新插入出现在底部的可能性为零或页面顶部。因此,整个插入和删除操作将变得非常昂贵,如果删除索引,那么您的选择将变得非常昂贵。
特别是如果您有多个表并且它们之间存在关系,则不要将 Guid 视为主键。
我推荐以下两种解决方案。
如果您可以制作完美的复合键,就像它是银行软件一样,那么可能是branchId,transactionId将成为主键,其中branchId是插入记录的节点的标识,transactionId是分支的自动编号,因此您将一路获得独特性。
如果以上不是您喜欢做或考虑的事情,那么您可以使用 Guid 作为唯一字段,但添加自动增量数字作为主键,这将帮助您降低总体成本,例如客户端(节点)发送数据时的成本使用(Web服务)RPC,那么您必须将记录插入服务器数据库,然后将生成一个自动编号,并且该自动编号可用于将来的选择、删除或更新,但客户端不必知道该自动编号
我知道第二个解决方案有点令人困惑和复杂,但它还是比用Guids来PK要好。但如果解决方案 1 适用,那就采用它。
当我说成本时,它不仅是处理时间,还有锁定(等待)时间,这完全是浪费金钱,你的四核服务器可能只执行一半的任务,更多的锁意味着更多的死锁机会,所以我的朋友永远不会使用指南。
问候
穆巴沙尔
I suggest you to never consider GUIDs one reason is that currently i am having trouble with them suppose if you have millions of users then you may need a bigger degree of concurrency and Guids will ruin your life while Insert and delete because you will have an index on them and in default it will be a Clustered index that mean when you have a clustered index every insert and delete will move the record physically and moreover Guids are not sequential so there would be a chance of zero that each new insert come at the bottom or top on the page. so the overall insert and delete operation will become very costly and if you remove the index then your selects will become costly.
Specially if you have multiple tables and there are relations between them don't consider Guids as Primary Key.
There are following Two solution i would recommend.
if you can make composite keys that will be perfect like if its a bank software then might be branchId, transactionId will become the primary key where branchId is identity of the node inserting the record and transactionId is an auto num at branch so you will get uniqueness all the way.
if above is not what you like to do or considering then you can use the Guid as a unique filed but add an auto increment number as primary key this will help you to reduce the overall cost like when client(node) sends data using (web service) RPC then you have to insert record into server database then an autonumber will be generated and this autonumber can be use for future select, delete or update but client dont have to know about this autonumber
i understand that second solution is a bit confusing and complex but its still better than using Guids as PK. but if solution 1 is applicable go for it.
When I am saying Cost its not only the processing time but its lock(wait) time as well that is totally the waste of money and your quad core server may be performing half of it and more locks mean more chance of deadlocks so my friend never use Guids.
Regards
Mubashar