顺序引导和碎片

发布于 2024-09-13 12:11:53 字数 302 浏览 10 评论 0原文

我试图了解顺序 guid 的性能如何比常规 guid 更好。

是否因为使用常规 guid,索引使用 guid 的最后一个字节进行排序?由于它是随机的,因此会导致大量碎片和页面拆分,因为它经常将数据移动到另一个页面以插入新数据?

顺序引导正弦它是顺序的,它会导致更少的页面分割和碎片吗?

我的理解正确吗?

如果有人能够对这个主题有更多的了解,我将非常感激。

谢谢

编辑:

顺序 guid = NEWSEQUENTIALID(),

常规 guid = NEWID()

I'm trying to understand how sequential guid performs better than a regular guid.

Is it because with regular guid, the index use the last byte of the guid to sort? Since it's random it will cause alot of fragmentation and page splits since it will often move data to another page to insert new data?

Sequential guid sine it is sequential it will cause alot less page splits and fragmentation?

Is my understanding correct?

If anyone can shed more lights on the subject, I'll appreciated very much.

Thank you

EDIT:

Sequential guid = NEWSEQUENTIALID(),

Regular guid = NEWID()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

林空鹿饮溪 2024-09-20 12:11:53

你在问题中已经说得差不多了。

使用顺序 GUID/主键,新行将被添加到表的末尾,这使得 SQL Server 的事情变得简单。相比之下,随机主键意味着新记录可以插入表中的任何位置 - 表的最后一页在缓存中的可能性相当大(如果这是所有读取的位置),但是缓存中表中间的随机页相当低,这意味着需要额外的 IO。

最重要的是,当将行插入到表的中间时,可能没有足够的空间来插入额外的行。如果是这种情况,那么 SQL Server 需要执行额外昂贵的 IO 操作,以便为记录创建空间 - 避免这种情况的唯一方法是在数据之间分散间隙,以允许插入额外的记录(称为填充因子),这本身会导致性能问题,因为数据分布在更多页面上,因此需要更多 IO 来访问整个表。

You've pretty much said it all in your question.

With a sequential GUID / primary key new rows will be added together at the end of the table, which makes things nice an easy for SQL server. In comparison a random primary key means that new records could be inserted anywhere in the table - the chance of the last page for the table being in the cache is fairly likely (if that's where all of the reads are going), however the chance of a random page in the middle of the table being in the cache is fairly low, meaning additional IO is required.

On top of that, when inserting rows into the middle of the table there is the chance that there isn't enough room to insert the extra row. If this is the case then SQL server needs to perform additional expensive IO operations in order to create room for the record - the only way to avoid this is to have gaps scattered amongst the data to allow for extra records to be inserted (known as a Fill factor), which in itself causes performance issues because the data is spread over more pages and so more IO is required to access the entire table.

泛泛之交 2024-09-20 12:11:53

我尊重 Kimberly L. Tripp 在这个话题上的智慧:

但是,GUID 不是连续的 -
就像一个有自己价值观的人
在客户端生成(使用.NET)
OR 由 newid() 函数生成
(在 SQL Server 中)可能会非常糟糕
选择——主要是因为
它产生的碎片
基表还因为它
尺寸。它不必要地宽(它是 4
比基于 int 的身份宽几倍
- 可以为您提供 20 亿(实际上是 40 亿)唯一行)。和,
如果你需要超过20亿你
总是可以使用 bigint (8 字节
int) 并获得 263-1 行。

阅读更多: http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx#ixzz0wDK6cece

I defer to Kimberly L. Tripp's wisdom on this topic:

But, a GUID that is not sequential -
like one that has it's values
generated in the client (using .NET)
OR generated by the newid() function
(in SQL Server) can be a horribly bad
choice - primarily because of the
fragmentation that it creates in the
base table but also because of its
size. It's unnecessarily wide (it's 4
times wider than an int-based identity
- which can give you 2 billion (really, 4 billion) unique rows). And,
if you need more than 2 billion you
can always go with a bigint (8-byte
int) and get 263-1 rows.

Read more: http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx#ixzz0wDK6cece

人生戏 2024-09-20 12:11:53

要可视化整个图片名为ostress的util可能是用过的。
例如,您可以创建两个表:一个使用正常 GUID 作为 PK,另一个使用顺序 GUID:

-- normal one
CREATE TABLE dbo.YourTable(
   [id] [uniqueidentifier] NOT NULL,
   CONSTRAINT [PK_YourTable] PRIMARY KEY NONCLUSTERED (id)
);
-- sequential one
CREATE TABLE dbo.YourTableSeq(
   [id] [uniqueidentifier] NOT NULL CONSTRAINT [df_yourtable_id]  DEFAULT (newsequentialid()),
   CONSTRAINT [PK_YourTableSeq] PRIMARY KEY NONCLUSTERED (id)
);

然后使用给定的 util,您可以选择有关索引碎片的统计信息来运行大量插入:

ostress -Slocalhost -E -dYourDB -Q"INSERT INTO dbo.YourTable VALUES (NEWID()); SELECT count(*) AS Cnt FROM dbo.YourTable; SELECT AVG_FRAGMENTATION_IN_PERCENT AS AvgPageFragmentation, PAGE_COUNT AS PageCounts FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, N'LIMITED') DPS INNER JOIN sysindexes SI ON DPS.OBJECT_ID = SI.ID AND DPS.INDEX_ID = SI.INDID WHERE SI.NAME = 'PK_YourTable';" -oE:\incoming\TMP\ -n1 -r10000

ostress -Slocalhost -E -dYourDB -Q"INSERT INTO dbo.YourTableSeq DEFAULT VALUES; SELECT count(*) AS Cnt FROM dbo.YourTableSeq; SELECT AVG_FRAGMENTATION_IN_PERCENT AS AvgPageFragmentation, PAGE_COUNT AS PageCounts FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, N'LIMITED') DPS INNER JOIN sysindexes SI ON DPS.OBJECT_ID = SI.ID AND DPS.INDEX_ID = SI.INDID WHERE SI.NAME = 'PK_YourTableSeq';" -oE:\incoming\TMP\ -n1 -r10000

然后在文件 E 中: \incoming\TMP\query.out 您将找到您的统计信息。
我的结果是:

"Normal" GUID:
Records    AvgPageFragmentation     PageCounts           
---------------------------------------------- 
1000       87.5                     8                    
2000       93.75                    16                   
3000       96.15384615384616        26                   
4000       96.875                   32                   
5000       96.969696969696969       33                   
10000      98.571428571428584       70                   


Sequential GUID:
Records    AvgPageFragmentation     PageCounts           
---------------------------------------------- 
1000       83.333333333333343       6                    
2000       63.636363636363633       11                   
3000       41.17647058823529        17                   
4000       31.818181818181817       22                   
5000       25.0                     28                   
10000      12.727272727272727       55       

正如您所看到的,插入顺序生成的 GUID 时,索引的碎片要少得多,因为插入操作导致新页面分配更少。

To visualize the whole picture util named ostress might be used.
E.g. you can create two tables: one with normal GUID as PK, another with sequential GUID:

-- normal one
CREATE TABLE dbo.YourTable(
   [id] [uniqueidentifier] NOT NULL,
   CONSTRAINT [PK_YourTable] PRIMARY KEY NONCLUSTERED (id)
);
-- sequential one
CREATE TABLE dbo.YourTableSeq(
   [id] [uniqueidentifier] NOT NULL CONSTRAINT [df_yourtable_id]  DEFAULT (newsequentialid()),
   CONSTRAINT [PK_YourTableSeq] PRIMARY KEY NONCLUSTERED (id)
);

Then with a given util you run a numbero of inserts with selection of statistics about index fragmentation:

ostress -Slocalhost -E -dYourDB -Q"INSERT INTO dbo.YourTable VALUES (NEWID()); SELECT count(*) AS Cnt FROM dbo.YourTable; SELECT AVG_FRAGMENTATION_IN_PERCENT AS AvgPageFragmentation, PAGE_COUNT AS PageCounts FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, N'LIMITED') DPS INNER JOIN sysindexes SI ON DPS.OBJECT_ID = SI.ID AND DPS.INDEX_ID = SI.INDID WHERE SI.NAME = 'PK_YourTable';" -oE:\incoming\TMP\ -n1 -r10000

ostress -Slocalhost -E -dYourDB -Q"INSERT INTO dbo.YourTableSeq DEFAULT VALUES; SELECT count(*) AS Cnt FROM dbo.YourTableSeq; SELECT AVG_FRAGMENTATION_IN_PERCENT AS AvgPageFragmentation, PAGE_COUNT AS PageCounts FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, N'LIMITED') DPS INNER JOIN sysindexes SI ON DPS.OBJECT_ID = SI.ID AND DPS.INDEX_ID = SI.INDID WHERE SI.NAME = 'PK_YourTableSeq';" -oE:\incoming\TMP\ -n1 -r10000

Then in file E:\incoming\TMP\query.out you will find your statistics.
My results are:

"Normal" GUID:
Records    AvgPageFragmentation     PageCounts           
---------------------------------------------- 
1000       87.5                     8                    
2000       93.75                    16                   
3000       96.15384615384616        26                   
4000       96.875                   32                   
5000       96.969696969696969       33                   
10000      98.571428571428584       70                   


Sequential GUID:
Records    AvgPageFragmentation     PageCounts           
---------------------------------------------- 
1000       83.333333333333343       6                    
2000       63.636363636363633       11                   
3000       41.17647058823529        17                   
4000       31.818181818181817       22                   
5000       25.0                     28                   
10000      12.727272727272727       55       

As you can see with sequentially generated GUID being inserted, index is much less fragmented as the insert operation leads to new page allocation rarer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文