Sequential Guid 相对于标准 Guid 的性能提升有哪些?

发布于 2024-07-07 19:41:22 字数 1454 浏览 7 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

尛丟丟 2024-07-14 19:41:22

GUID 与顺序 GUID

典型的模式是使用 Guid 作为表的 PK,但是,正如其他讨论中提到的那样(请参阅 GUID / UUID 数据库密钥的优点和缺点)
存在一些性能问题。

这是一个典型的Guid序列

f3818d69-2552-40b7-a403-01a6db4552f7
7ce31615-fafb-42c4-b317-40d21a6a3c60
94732fc7-768e-4cf2-9107-f0953f6795a5

此类数据存在的问题是:<
-

  • 值的广泛分布
  • 几乎随机的
  • 索引使用非常非常糟糕
  • 大量叶子移动
  • 几乎每个 PK 都需要至少
    在非聚集索引上
  • 问题发生在 Oracle 和
    SQL 服务器

一个可能的解决方案是使用 Sequential Guid,其生成方式如下:

cc6466f7-1066-11dd-acb6-005056c00008
cc6466f8-1066-11dd-acb6-005056c00008
cc6466f9-1066-11dd-acb6-005056c00008

如何从 C# 代码生成它们:

[DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(out Guid guid);

public static Guid SequentialGuid()
{
    const int RPC_S_OK = 0;
    Guid g;
    if (UuidCreateSequential(out g) != RPC_S_OK)
        return Guid.NewGuid();
    else
        return g;
}

好处

  • 更好地使用索引
  • 允许使用聚集键(将
    已在 NLB 场景中验证)
  • 更少的磁盘使用
  • 量 性能提升 20-25%
    最低成本

现实生活中的测量:
场景:

  • Guid 存储为 UniqueIdentifier
    SQL Server 上的类型
  • Guid 在 Oracle 上存储为 CHAR(36)
  • 大量插入操作,批处理
    一起在单个事务中
  • 从 1 到 100 次插入,具体取决于
    上表
  • 一些表 > 1000 万行

实验室测试 – SQL Server

VS2008测试,10个并发用户,无思考时间,叶表批量插入600次的基准流程

标准指南

平均。 处理持续时间:10.5

平均。 第二次请求:54.6

平均。 回复。 时间:0.26

顺序指南

平均。 处理持续时间:4.6

平均。 第二次请求:87.1

平均。 回复。 时间:0.12

Oracle 上的结果(抱歉,测试使用了不同的工具)1.327.613 在具有 Guid PK 的表上插入

标准 Guid0.02 秒。 每次插入所用的时间,2.861 秒。 CPU 时间,总计 31.049 秒。 过去

顺序引导0.00秒。 每次插入所用的时间,1.142 秒。 CPU 时间,总计 3.667 秒。 过去

数据库文件顺序读取等待时间从 6.4 百万个等待事件 62.415 秒变为 1.2 百万个等待事件 <强>11.063秒。

重要的是要看到所有顺序 guid 都可以被猜测,因此如果考虑到安全性,仍然使用标准 guid,那么使用它们并不是一个好主意。
简而言之...如果您使用 Guid 作为 PK,则每次不从 UI 来回传递时都使用顺序 guid,它们将加快操作速度并且不需要任何成本来实现。

GUID vs.Sequential GUID

A typical pattern it's to use Guid as PK for tables, but, as referred in other discussions (see Advantages and disadvantages of GUID / UUID database keys)
there are some performance issues.

This is a typical Guid sequence

f3818d69-2552-40b7-a403-01a6db4552f7
7ce31615-fafb-42c4-b317-40d21a6a3c60
94732fc7-768e-4cf2-9107-f0953f6795a5

Problems of this kind of data are:<
-

  • Wide distributions of values
  • Almost randomically ones
  • Index usage is very, very, very bad
  • A lot of leaf moving
  • Almost every PK need to be at least
    on a non clustered index
  • Problem happens both on Oracle and
    SQL Server

A possible solution is using Sequential Guid, that are generated as follows:

cc6466f7-1066-11dd-acb6-005056c00008
cc6466f8-1066-11dd-acb6-005056c00008
cc6466f9-1066-11dd-acb6-005056c00008

How to generate them From C# code:

[DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(out Guid guid);

public static Guid SequentialGuid()
{
    const int RPC_S_OK = 0;
    Guid g;
    if (UuidCreateSequential(out g) != RPC_S_OK)
        return Guid.NewGuid();
    else
        return g;
}

Benefits

  • Better usage of index
  • Allow usage of clustered keys (to be
    verified in NLB scenarios)
  • Less disk usage
  • 20-25% of performance increase at a
    minimum cost

Real life measurement:
Scenario:

  • Guid stored as UniqueIdentifier
    types on SQL Server
  • Guid stored as CHAR(36) on Oracle
  • Lot of insert operations, batched
    together in a single transaction
  • From 1 to 100s of inserts depending
    on table
  • Some tables > 10 millions rows

Laboratory Test – SQL Server

VS2008 test, 10 concurrent users, no think time, benchmark process with 600 inserts in batch for leaf table

Standard Guid

Avg. Process duration: 10.5 sec

Avg. Request for second: 54.6

Avg. Resp. Time: 0.26

Sequential Guid

Avg. Process duration: 4.6 sec

Avg. Request for second: 87.1

Avg. Resp. Time: 0.12

Results on Oracle (sorry, different tool used for test) 1.327.613 insert on a table with a Guid PK

Standard Guid, 0.02 sec. elapsed time for each insert, 2.861 sec. of CPU time, total of 31.049 sec. elapsed

Sequential Guid, 0.00 sec. elapsed time for each insert, 1.142 sec. of CPU time, total of 3.667 sec. elapsed

The DB file sequential read wait time passed from 6.4 millions wait events for 62.415 seconds to 1.2 million wait events for 11.063 seconds.

It's important to see that all the sequential guid can be guessed, so it's not a good idea to use them if security is a concern, still using standard guid.
To make it short... if you use Guid as PK use sequential guid every time they are not passed back and forward from a UI, they will speed up operation and do not cost anything to implement.

彼岸花似海 2024-07-14 19:41:22

我可能在这里遗漏了一些东西(如果我遗漏了,请随时纠正我),但我认为使用顺序 GUID/UUID 作为主键几乎没有什么好处。

在自动增量整数上使用 GUID 或 UUID 的要点是:

  • 它们可以在任何地方创建,无需联系数据库。
  • 它们是在您的应用程序中完全唯一的标识符(在这种情况下) UUID 的数量,普遍唯一)
  • 给定一个标识符,除了暴力破解一个巨大密钥空间之外,无法猜测下一个或上一个(甚至任何其他有效标识符) 。

不幸的是,使用你的建议,你会失去所有这些东西。

所以,是的。 您已经使 GUID 变得更好了。 但在这个过程中,你已经抛弃了几乎所有使用它们的理由。

如果您确实想要提高性能,请使用标准自动增量整数主键。 这提供了您所描述的所有好处(以及更多),同时几乎在所有方面都比“顺序指南”更好。

这很可能会被遗忘,因为它没有具体回答你的问题(这显然是精心设计的,所以你可以立即自己回答),但我觉得这是一个更重要的问题。

I may be missing something here (feel free to correct me if I am), but I can see very little benefit in using sequential GUID/UUIDs for primary keys.

The point of using GUIDs or UUIDs over autoincrementing integers is:

  • They can be created anywhere without contacting the database
  • They are identifiers that are entirely unique within your application (and in the case of UUIDs, universally unique)
  • Given one identifier, there is no way to guess the next or previous (or even any other valid identifiers) outside of brute-forcing a huge keyspace.

Unfortunately, using your suggestion, you lose all those things.

So, yes. You've made GUIDs better. But in the process, you've thrown away almost all of the reasons to use them in the first place.

If you really want to improve performance, use a standard autoincrementing integer primary key. That provides all the benefits you described (and more) while being better than a 'sequential guid' in almost every way.

This will most likely get downmodded into oblivion as it doesn't specifically answer your question (which is apparently carefully-crafted so you could answer it yourself immediately), but I feel it's a far more important point to raise.

过去的过去 2024-07-14 19:41:22

正如 Massimogentilini 已经说过的,使用 UuidCreateSequential(在代码中生成 guid 时)可以提高性能。 但似乎缺少一个事实:SQL Server(至少 Microsoft SQL 2005 / 2008)使用相同的功能,但是:Guids 的比较/排序在 .NET 和 SQL Server 上不同,这仍然会导致更多 IO,因为指南的排序不正确。
为了生成 sql server 正确排序的 guid(排序),您必须执行以下操作(请参阅 比较详细信息):

[System.Runtime.InteropServices.DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(byte[] buffer);

static Guid NewSequentialGuid() {

    byte[] raw = new byte[16];
    if (UuidCreateSequential(raw) != 0)
        throw new System.ComponentModel.Win32Exception(System.Runtime.InteropServices.Marshal.GetLastWin32Error());

    byte[] fix = new byte[16];

    // reverse 0..3
    fix[0x0] = raw[0x3];
    fix[0x1] = raw[0x2];
    fix[0x2] = raw[0x1];
    fix[0x3] = raw[0x0];

    // reverse 4 & 5
    fix[0x4] = raw[0x5];
    fix[0x5] = raw[0x4];

    // reverse 6 & 7
    fix[0x6] = raw[0x7];
    fix[0x7] = raw[0x6];

    // all other are unchanged
    fix[0x8] = raw[0x8];
    fix[0x9] = raw[0x9];
    fix[0xA] = raw[0xA];
    fix[0xB] = raw[0xB];
    fix[0xC] = raw[0xC];
    fix[0xD] = raw[0xD];
    fix[0xE] = raw[0xE];
    fix[0xF] = raw[0xF];

    return new Guid(fix);
}

此链接此链接

As massimogentilini already said, Performance can be improved when using UuidCreateSequential (when generating the guids in code). But a fact seems to be missing: The SQL Server (at least Microsoft SQL 2005 / 2008) uses the same functionality, BUT: the comparison/ordering of Guids differ in .NET and on the SQL Server, which would still cause more IO, because the guids will not be ordered correctly.
In order to generate the guids ordered correctly for sql server (ordering), you have to do the following (see comparison details):

[System.Runtime.InteropServices.DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(byte[] buffer);

static Guid NewSequentialGuid() {

    byte[] raw = new byte[16];
    if (UuidCreateSequential(raw) != 0)
        throw new System.ComponentModel.Win32Exception(System.Runtime.InteropServices.Marshal.GetLastWin32Error());

    byte[] fix = new byte[16];

    // reverse 0..3
    fix[0x0] = raw[0x3];
    fix[0x1] = raw[0x2];
    fix[0x2] = raw[0x1];
    fix[0x3] = raw[0x0];

    // reverse 4 & 5
    fix[0x4] = raw[0x5];
    fix[0x5] = raw[0x4];

    // reverse 6 & 7
    fix[0x6] = raw[0x7];
    fix[0x7] = raw[0x6];

    // all other are unchanged
    fix[0x8] = raw[0x8];
    fix[0x9] = raw[0x9];
    fix[0xA] = raw[0xA];
    fix[0xB] = raw[0xB];
    fix[0xC] = raw[0xC];
    fix[0xD] = raw[0xD];
    fix[0xE] = raw[0xE];
    fix[0xF] = raw[0xF];

    return new Guid(fix);
}

or this link or this link.

审判长 2024-07-14 19:41:22

参见这篇文章:
(http://www.shirmanov.com/2010/05/generate- newsequentialid-兼容.html

即使 MSSql 使用相同的函数来生成 NewSequencialIds
( UuidCreateSequential(out Guid guid) ),MSSQL 颠倒了第 3 个和第 4 个字节模式,这不会提供与在代码中使用此函数时获得的结果相同的结果。 Shirmanov 展示了如何获得与 MSSQL 创建的完全相同的结果。

See This article:
(http://www.shirmanov.com/2010/05/generating-newsequentialid-compatible.html)

Even though MSSql uses this same function to generate NewSequencialIds
( UuidCreateSequential(out Guid guid) ), MSSQL reverses the 3rd and 4th byte patterns which does not give you the same result that you would get when using this function in your code. Shirmanov shows how to get the exact same results that MSSQL would create.

隔岸观火 2024-07-14 19:41:22

我使用实体框架弄乱了 Guid(集群和非集群)、Sequential Guid 和 int(身份/自动增量)之间的区别。 与具有同一性的 int 相比,Sequential Guid 的速度快得惊人。 Sequential 的结果和代码指南在这里

I messured difference between Guid (clustered and non clustered), Sequential Guid and int (Identity/autoincrement) using Entity Framework. The Sequential Guid was surprisingly fast compared to the int with identity. Results and code of the Sequential Guid here.

热鲨 2024-07-14 19:41:22

如果您需要使用顺序GUId,SQL Server 2005可以使用NEWSEQUENTIALID()函数为您生成它们。

但是由于 GUId 的基本用法是生成无法猜测的密钥(或备用密钥)(例如,避免人们在 GET 上传递猜测的密钥),所以我不知道它们有多么适用,因为他们很容易被猜到。

来自 MSDN

重要:
如果担心隐私,请勿使用此功能。 它
可以猜测的值
接下来生成 GUID,因此,
访问与该 GUID 关联的数据。

If you need to use sequential GUIds, SQL Server 2005 can generate them for you with the NEWSEQUENTIALID() function.

However since the basic usage of GUIds is to generate keys (or alternate keys) that cannot be guessed (for example to avoid people passing guessed keys on GETs), I don't see how applicable they are because they are so easily guessed.

From MSDN:

Important:
If privacy is a concern, do not use this function. It
is possible to guess the value of the
next generated GUID and, therefore,
access data associated with that GUID.

随梦而飞# 2024-07-14 19:41:22

好吧,我自己设计和制作终于到了这一步了。

我生成一个 COMB_GUID,其中高 32 位基于 Unix 时间的第 33 位到第 1 位(以毫秒为单位)。 因此,每 2 毫秒有 93 位随机性,高位每 106 年翻转一次。 COMB_GUID(或类型 4 UUID)的实际物理表示是 128 位的 Base64 编码版本,即 22 个字符的字符串。

当在 postgres 中插入时,完全随机的 UUID 和 COMB _GUID 之间的速度比对 COMB_GUID 有利。
经过多次测试(一百万条记录测试),COMB_GUID 在我的硬件上速度2 倍快。 这些记录包含 id(22 个字符)、一个字符串字段(110 个字符)、一个双精度和一个 INT。

在 ElasticSearch 中,两者在索引方面没有明显区别。 我仍然会使用 COMB_GUIDS,以防内容进入链中任何位置的 BTREE 索引,因为内容与馈送时间相关,或者可以在 id 字段上进行预排序,以便它IS时间相关且部分相关依次进行,速度会加快。

非常有趣。
创建 COMB_GUID 的 Java 代码如下。

import java.util.Arrays;
import java.util.UUID;
import java.util.Base64; //Only avail in Java 8+
import java.util.Date;

import java.nio.ByteBuffer; 

    private ByteBuffer babuffer = ByteBuffer.allocate( (Long.SIZE/8)*2 );
private Base64.Encoder encoder = Base64.getUrlEncoder();
public  String createId() {
    UUID uuid = java.util.UUID.randomUUID();
        return uuid2base64( uuid );
}

    public String uuid2base64(UUID uuid){ 

        Date date= new Date();
        int intFor32bits;
        synchronized(this){
        babuffer.putLong(0,uuid.getLeastSignificantBits() );
        babuffer.putLong(8,uuid.getMostSignificantBits() );

                long time=date.getTime();
        time=time >> 1; // makes it every 2 milliseconds
                intFor32bits = (int) time; // rolls over every 106 yers + 1 month from epoch
                babuffer.putInt( 0, intFor32bits);

    }
        //does this cause a memory leak?
        return encoder.encodeToString( babuffer.array() );
    }

}

OK, I finally got to this point in design and production myself.

I generate a COMB_GUID where the upper 32 bits are based on the bits 33 through 1 of Unix time in milliseconds. So, there are 93 bits of randomness every 2 milliseconds and the rollover on the upper bits happens every 106 years. The actual physical representation of the COMB_GUID (or type 4 UUID) is a base64 encoded version of the 128 bits, which is a 22 char string.

When inserting in postgres the ratio of speed between a fully random UUID and a COMB _GUID holds as beneficial for the COMB_GUID.
The COMB_GUID is 2X faster on my hardware over multiple tests, for a one million record test. The records contain the id(22 chars), a string field (110 chars), a double precision, and an INT.

In ElasticSearch, there is NO discernible difference between the two for indexing. I'm still going to use COMB_GUIDS in case content goes to BTREE indexes anywhere in the chain as the content is fed time related, or can be presorted on the id field so that it IS time related and partially sequential, it will speed up.

Pretty interesting.
The Java code to make a COMB_GUID is below.

import java.util.Arrays;
import java.util.UUID;
import java.util.Base64; //Only avail in Java 8+
import java.util.Date;

import java.nio.ByteBuffer; 

    private ByteBuffer babuffer = ByteBuffer.allocate( (Long.SIZE/8)*2 );
private Base64.Encoder encoder = Base64.getUrlEncoder();
public  String createId() {
    UUID uuid = java.util.UUID.randomUUID();
        return uuid2base64( uuid );
}

    public String uuid2base64(UUID uuid){ 

        Date date= new Date();
        int intFor32bits;
        synchronized(this){
        babuffer.putLong(0,uuid.getLeastSignificantBits() );
        babuffer.putLong(8,uuid.getMostSignificantBits() );

                long time=date.getTime();
        time=time >> 1; // makes it every 2 milliseconds
                intFor32bits = (int) time; // rolls over every 106 yers + 1 month from epoch
                babuffer.putInt( 0, intFor32bits);

    }
        //does this cause a memory leak?
        return encoder.encodeToString( babuffer.array() );
    }

}

墨落画卷 2024-07-14 19:41:22

查看 Jimmy Nilsson 的 COMB:一种 GUID,其中包含数字位已被替换为类似时间戳的值。 这意味着 COMB 可以排序,并且当用作主键时,插入新值时会减少索引页拆分。

另请参阅:可以使用唯一标识符 (GUID) 作为主键吗?

是的,唯一标识符 (GUID) 列可以很好地作为主键,但对于聚集索引。 在许多情况下,您最好在可能用于范围搜索的列上创建聚集索引,并在 GUID 列上创建非聚集索引。

Check out COMBs by Jimmy Nilsson: a type of GUID where a number of bits have been replaced with a timestamp-like value. This means that the COMBs can be ordered, and when used as a primary key result in less index page splits when inserting new values.

See also: Is it OK to use a uniqueidentifier (GUID) as a Primary Key?

Yes, a uniqueidentifier (GUID) column can be fine as a Primary Key, BUT it is not a particularly good choice for the clustered index. In many cases, you will be better off creating the clustered index on a column (or columns) that are likely be used in range searches, and create a non-clustered index on the GUID column.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文