Sql Server 旧数据库是否转为聚集索引

发布于 2024-09-15 12:30:26 字数 1066 浏览 1 评论 0原文

我们有一个旧数据库,即 sql server 数据库(2005 年和 2008 年)。

表中的所有主键都是唯一标识符。

这些表当前没有创建聚集索引,并且我们在只有 750k 记录的表上遇到了性能问题。这是我使用唯一标识符作为唯一主键的第一个数据库,我从未见过 sql server 返回数据这么慢。

我不想在 uniqueidentifier 上创建聚集索引,因为它们不是连续的,因此在插入数据时会减慢应用程序的速度。

我们无法删除 uniqueidentifier,因为它用于远程站点记录身份管理目的。

我曾考虑过向表中添加一个大整数标识列,并在此列上创建聚集索引并包含唯一标识符列。

int Identity - 保持插入速度的第一列 唯一标识符 - 确保应用程序继续按预期工作。

目标是提高身份查询和连接表查询的性能。

Q1:这会提高数据库的查询性能还是会降低数据库的查询性能?

Q2:是否有我未列出的替代方案?

谢谢 Pete

编辑:性能问题在于通过 select 语句快速检索数据,特别是在将一些更多“事务/更改”表连接在一起的情况下。

编辑2:表之间的连接通常都是在主键和外键之间,对于具有外键的表,它们被包含在非聚集索引中以提供更具覆盖性的索引。

这些表都没有其他可以提供良好聚集索引的值。

我更倾向于在每个高负载表上添加一个额外的标识列,然后在聚集索引中包含当前的 Guid PK 列,以提供最佳的查询性能。

编辑3: 我估计 80% 的查询是通过数据访问机制单独对主键和外键执行的。一般来说,我们的数据模型具有延迟加载的对象,这些对象在访问时执行查询,这些查询使用对象 id 和 PK 列。我们有大量用户驱动的数据排除/包含查询,这些查询使用外键列作为基于类型 X 排除以下 id 的条件的过滤器。剩下的 20% 是 Enum (int) 或日期范围列上的 where 子句,系统中很少执行基于文本的查询。

在可能的情况下,我已经添加了覆盖索引来覆盖最重的查询,但到目前为止我仍然对性能感到失望。正如 bluefooted 所说,数据被存储为堆。

We have a legacy database which is a sql server db (2005, and 2008).

All of the primary keys in the tables are UniqueIdentifiers.

The tables currently have no clustered index created on them and we are running into performance issues on tables with only 750k records. This is the first database i've worked on with unique identifiers as the sole primary key and I've never seen sql server be this slow with returning data.

I don't want to create a clustered index on the uniqueidentifier as they are not sequential and will therefore slow the apps down when it comes to inserting data.

We cannot remove the uniqueidentifier as that is used for remote site record identity management purposes.

I had thought about adding a big integer identity column to the tables and creating the clustered index on this column and including the unique identifier column.

i.e.

int identity - First column to maintain insert speeds
unique identifier - To ensure the application keeps working as expected.

The goal is to improve the identity query and joined table query performance.

Q1: Will this improve the query performance of the db or will it slow it down?

Q2: Is there an alternative to this that I haven't listed?

Thanks
Pete

Edit: The performance issues are on retrieving data quickly through select statements, especially if a few of the more "transactional / changing" tables are joined together.

Edit 2: The joins between tables are generally all between the primary key and foreign keys, for tables that have foreign keys they are included in the non-clustered index to provide a more covering index.

The tables all have no other values which would provide a good clustered index.

I'm leaning more towards adding an additional identity column on each of the high load tables and then including the current Guid PK column within the clustered index to provide the best query performance.

Edit 3:
I would estimate that 80% of the queries are performed on primary and foreign keys alone through the data access mechanism. Generally our data model has lazy loaded objects which perform the query when accessed, these queries use the objects id and the PK column. We have a large amount of user driven data exclusion / inclusion queries which use the foreign key columns as a filter based on the criteria of for type X exclude the following id's. The remaining 20% is where clauses on Enum (int) or date range columns, very few text based queries are performed in the system.

Where possible I have already added covering indexes to cover the heaviest queries, but as yet i'm still dissapointed by the performance. As bluefooted says the data is being stored as a heap.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

抱猫软卧 2024-09-22 12:30:26

如果表上没有聚集索引,它将存储为堆而不是 B 树。堆数据访问在 SQL Server 中绝对是非常糟糕的,所以你肯定需要添加聚集索引。

我同意您的分析,即 GUID 列对于聚类来说是一个糟糕的选择,特别是因为您无法使用 NEWSEQUENTIALID()。如果您愿意,您可以创建一个新的人工整数键,但如果有另一列或列组合可以作为聚集索引,那也可以。

您是否有一个经常用于范围扫描的字段?哪些列用于连接?除了 GUID 之外,是否存在也唯一标识行的列组合?发布数据模型的样本将帮助我们建议一个好的聚类候选者。

If you don't have a clustered index on the table, it is being stored as a heap rather than a b-tree. Heap data access is absolutely atrocious in SQL Server so you definitely need to add a clustered index.

I agree with your analysis that the GUID column is a poor choice for clustering, especially since you don't have the ability to use NEWSEQUENTIALID(). You could create a new artificial integer key if you like, but if there is another column or combination of columns that would make sense as a clustered index, that is fine as well.

Do you have a field that is used frequently for range scans? Which columns are used for joins? Is there a combination of columns that also uniquely identifies the row aside from the GUID? Posting a sample of the data model would help us to suggest a good candidate for clustering.

黑白记忆 2024-09-22 12:30:26

我不确定您的 GUID 来自哪里,但如果它们是在插入期间使用 NEWSEQUENTIALID() 而不是 NEWID() 将帮助您避免插入期间的碎片问题。

关于聚集索引的选择,正如 Kimberly L. Tripp 所说 这里:“选择聚集索引最重要的因素是它的唯一性、狭窄性和静态性(不断增加对于最小化分裂还有其他好处)。”与 INT 甚至 BIGINT 相比,GUID 无法满足狭窄的要求。

Kimberly 还有一篇关于 GUID 作为主键和/或集群键

I'm not sure where your GUIDs come from, but if they're being generated during the insert, using the NEWSEQUENTIALID() in SQL Server instead of NEWID() will help you avoid fragmentation issues during insert.

Regarding the choice of a clustered index, as Kimberly L. Tripp states here: "the most important factors in choosing a clustered index are that it's unique, narrow and static (ever-increasing has other benefits to minimizing splits)." A GUID falls short on the narrow requirement when compared to an INT or even BIGINT.

Kimberly also has an excellent article on GUIDs as PRIMARY KEYs and/or the clustering key.

奢望 2024-09-22 12:30:26

我不是 100% 清楚:您的第 1 号访问模式是通过 GUID 还是通过其他列查询表?当连接到其他表时,最常使用哪些列(和数据类型)?

在我进一步了解这些 GUID 的使用方式之前,我无法真正为您提供任何可靠的建议。我知道您说过它们是主键,但这并不能保证它们用作查询或连接的主要条件。

更新

现在我知道了更多一点,我有一个疯狂的建议。对 GUID 上的这些表进行聚类,但将填充因子设置为 60%。这将改善页面拆分问题,并为您提供更好的对这些小狗的查询性能。

至于使用Guid.NewGuid(),看来你毕竟可以在C#中执行sequentialGUID。我在 SO 上找到了以下代码:

[DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(out Guid guid);

public static Guid SequentialGuid()
{
    const int RPC_S_OK = 0;
    Guid g;
    if (UuidCreateSequential(out g) != RPC_S_OK)
        return Guid.NewGuid();
    else
        return g;
}

newsequentialID() 实际上只是 UuidCreateSequential 的包装。我确信如果您不能直接在客户端上使用它,您可以找到一种方法来快速往返服务器以从那里获取新的顺序 ID,甚至可以使用“分配器”表和存储过程来完成这项工作。

It's not 100% clear to me: is your number 1 access pattern to query the tables by the GUID or by other columns? And when joining to other tables, what columns (and data types) are most often used?

I can't really give you any solid recommendations until I understand more about how these GUIDs are used. I realize you said they're primary keys, but that doesn't guarantee they are used as the primary conditions on queries or in joins.

UPDATE

Now that I know a little more, I have a crazy suggestion. Do cluster those tables on the GUIDs, but set the fill factor to 60%. This will ameliorate the page split problem and give you better performance querying on those puppies.

As for using Guid.NewGuid(), it seems that you can do sequentialGUIDs in C# after all. I found the following code here on SO:

[DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(out Guid guid);

public static Guid SequentialGuid()
{
    const int RPC_S_OK = 0;
    Guid g;
    if (UuidCreateSequential(out g) != RPC_S_OK)
        return Guid.NewGuid();
    else
        return g;
}

newsequentialID() is actually just a wrapper for UuidCreateSequential. I'm sure if you can't use this directly on the client you can figure out a way to make a quick round-trip to the server to get a new sequential id from there, perhaps even with a "dispenser" table and a stored procedure to do the job.

水中月 2024-09-22 12:30:26

您没有指出您的性能问题是什么。如果性能最差的操作是 INSERT,那么也许您的解决方案是正确的。如果是其他事情,那么我会看看聚集索引如何提供帮助。

您可以查看表上的现有索引以及使用它们的查询。您也许可以选择一个索引,虽然它会稍微降低 INSERT 的性能,但可以为当前的性能问题区域提供更大的好处。

You don't indicate what your performance issues are. If the worst performing action is an INSERT, then maybe your solution is right. If it's something else, then I'd look at how the clustered index can help that.

You might look at existing indexes on the table and the queries that use them. You may be able to select an index that, while degrades INSERTs slightly, provides a greater benefit to the current performance-problem areas.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文