当Guid是聚集索引时,通过Guid搜索表会更快吗?
如果我要通过 Guid 查询表(不管 Guid 的碎片问题),将 Guid 作为聚集索引而不是非聚集索引或根本不使用索引会更快吗?
这个问题是从只读的角度提出的。我只是好奇特定 Guid 的搜索行之间是否会提高速度,并且使用/不使用索引或使用/不使用聚集索引时搜索速度会更快吗?
或者,我对下一个问题的答案相当确定,但现在将 int 标识符应用于上一个问题。如果表按该 int 聚簇,搜索会更快吗? (这是不是由表中的其他项目聚集?)
我知道关于此主题还有许多其他问题,但我还没有在其中找到我正在寻找的具体答案:
Sequential Guid 主键列是否应该是聚集的索引?
提高集群索引 GUID 主键的性能
SQL 中唯一标识符 ID 列上的聚集主键服务器
带索引的唯一标识符
我应该删除 Guid 列上的聚集索引
感谢您的帮助!
If I am going to be querying a table by Guids (irregardless of fragmentation problems with Guids), would it be faster to have the Guid as the clustered index rather than the non-clustered index or no index at all?
This question is coming from a read-only standpoint. I'm just curious if there will be a speed improvement between the searching rows for a specific Guid, and will searching complete faster with/without an index or with/without a clustered index?
Alternatively, I'm fairly certain in the answer to my next question, but now apply int identifiers to the previous question. Will it be faster to search if the table is clustered by that int? (This is rather than clustered by some other item in the table?)
I know there are many other questions posted on this topic, but I haven't found the specific answer that I'm looking for in any of these:
Should a Sequential Guid primary key column be a clustered index?
Improving performance of cluster index GUID primary key
Clustered primary key on unique identifier ID column in SQL Server
uniqueidentifier with index
Should I get rid of clustered indexes on Guid columns
Thanks for any help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用整数聚集索引的表肯定会比 GUID 索引查询得更快。原因是数据类型的大小。
如果您已经决定使用 GUID 作为键,那么可能使用 newSequentialId() 而不是 NewId() 生成这些 GUID,因为这会减少 Guid 索引中碎片的影响,因为 Ids ae 总是在增加,并且您拥有页面拆分。
补充一下我的观点,将其作为聚集索引是一个自然的选择,除非您有聚集索引的潜在候选者,即如果您不将此 guid 用于关键目的。如果它是一个相对较小的表,那么您可以选择不使用索引,否则使用索引总是好的。
The table will certainly query faster with Integer clustered indexes than GUID indexes. The reason being the size of the data type.
If you have already decided to go with GUIDs as key then probably generate these GUIDs using newSequentialId() instead of NewId() as this would reduce the effects of fragmentation in Guid indexes as the Ids ae always increasing and you have less chances of having a page split.
Adding to my point, it is a natural choice to go with this as a clustered index unless you have a potential candidate for a clustered index i.e. if you are using this guid not for key purposes. If its a relatively small table that is when you have a choice to not have an index else its always good to have indexes.
假设使用 MS SQL Server。这可能适用于其他 RDBMS,也可能不适用于:
如果您有聚集索引,那么它将是最快的,但如果您正在搜索单行,那么该索引与非聚集索引之间的差异将可以忽略不计。当您使用非聚集索引时,服务器需要首先在索引中找到正确的值,然后从表存储中获取完整记录。表存储是聚集索引,因此通过聚集索引搜索消除了该步骤(称为书签查找),但该步骤对于单行来说几乎感觉不到。
当聚集索引位于按范围选择的列(例如,交易日期并且您想要查找过去一个月的所有交易)时,聚集索引往往会为读取提供更大的优势。在这种情况下,服务器可以找到起始位置,并通过快速、连续的扫描读取数据。
在 INT 上使用非聚集索引(所有其他条件相同)会比使用 GUID 稍快一些,因为索引本身会更小(因为 INT 比 GUID 小得多),这意味着服务器必须遍历更少的页面找到它想要获得的价值。对于聚集索引,如果您的行大小与 GUID 和 INT 之间的差异相比已经很大,我认为您不会看到太大的差异,但我还没有对此进行任何测试。
Assuming MS SQL Server. This may or may not apply to other RDBMSs:
If you have a clustered index then it will be fastest, although if you're searching for a single row then the difference between that and a non-clustered index will be negligible. When you use a non-clustered index the server needs to first find the right value in the index and then go fetch the full record from the table storage. The table storage is the clustered index, so searching by a clustered index eliminates that step (called a Bookmark Lookup), but that step is almost imperceptible for a single row.
Clustered indexes tend to provide a bigger advantage for reading when they are on a column that is selected by range (for example, transaction date and you want to find all transactions for the past month). In that case the server can find the start and just read off the data in one quick, sequential sweep.
Having a non-clustered index on an INT (all other things being equal) will be slightly faster than using a GUID because the index itself will be smaller (because INTs are much smaller than GUIDs) which means that the server has to traverse fewer pages to find the value that it's looking to get. In the case of a clustered index I don't think that you'll see much of a difference if your row sizes are already large compared to the difference between a GUID and an INT, but I haven't done any testing on that.
正如 Tom 已经提到的,在聚集索引上搜索单个元素总是会更快。这是因为聚集索引就是数据本身,找到索引条目后不需要进行任何查找。
聚集索引的主要优点是能够提取数据的“范围”(例如“上周”或“按日期的订单历史记录”)。由于 GUID 往往均匀分布在表中,因此您将无法在这里获得此好处。另外每个表只能有一个聚集索引,所以要谨慎选择。
如果您最常查询某个表的特定范围,则可以将该表视为聚集索引。
还有第三种,称为覆盖索引。覆盖索引由多个字段组成,这将能够满足最常见的查询。例如,您有一个带有 ID,Displayname,Password,LogonDate,..... 的 USER 表,并且您将经常需要 DisplayName,基于 ID 创建索引,Displayname 将被视为诸如
< 之 类的查询的覆盖索引代码>从用户中选择显示名称,其中ID = XYZ
编辑:
我忘记提及一件事。对于 SQL 而言,GUID 是一个相当大的对象(嗯……16 字节)。将其作为聚集索引会强制该表上的所有其他索引包含指向 GUID 的 16 字节指针。如果该表上有一堆索引,这可能会增加。聚集索引最好的是它小且唯一。这就是 INT 如此优秀的原因。
Like Tom already mentioned, the search on a clustered index for a single element will allways be faster. This is because the clustered index is thae data itself, and no lookups are requiered after you found your index entry.
The main advantage for a clustered index is the ability to extract "ranges" of data (like "last week", or "Orderhistory by Date"). Since a GUID tends to spread evenly over the table, you will fail to gain this benefit here. Also each table can only have one clustered index, so pick carefully.
If you query a table most commenly for a specific range, then consider that one as a clustered index.
There is also a 3rd kind, which is called a covering index. A covering index consists of several fields, which will be able so satisfy the most common query. For example, you have a USER table with a ID,Displayname,Password,LogonDate,..... and you will need the DisplayName frequently, creating a index based on ID,Displayname would be considered a covering index for a query like
Select Displayname from USER where ID=XYZ
Edit:
One thing I forgot to mention. A GUID is quite a large object when it comes to SQL (Well... 16 Bytes). Having it as the clustered index forces all other indices on that table to inlcude the 16 Byte pointer to the GUID. This can add up if you have a bunch of indices on that table. Theclustered index is best is it is small and unique. Thats why INTs are so nice.