删除主键(聚集索引)以提高插入性能

发布于 2024-12-02 09:37:37 字数 924 浏览 1 评论 0原文

我们一直在经历 SQL 超时,并已确定该瓶颈是审核表 - 我们系统中的所有表都包含插入、更新和删除触发器,这些触发器会导致新的审核记录。

这意味着审计表是系统中最大且最繁忙的表。然而数据只进入,永远不会出来(在这个系统下),所以不需要 select 性能。

运行 select top 10 返回最近插入的记录,而不是“第一个”记录。当然,order by 可以工作,但我希望选择顶部应该根据它们在光盘上的顺序返回行 - 我希望这会返回最低的 PK 值。

有人建议我们放弃聚集索引,实际上也放弃主键(唯一约束)。正如我之前提到的,无需在该系统中从该表中选择

聚集索引会对表造成什么样的性能影响?拥有无索引、非集群、无键表会产生哪些(非选择)后果?还有其他建议吗?

编辑

我们的审计涉及 CLR 功能,我现在正在使用 & 进行基准测试。无需 PK、索引、FK 等即可确定 CLR 函数和 CLR 函数的相对成本。的限制条件。

经过调查,性能不佳与 insert 语句无关,而是与协调审核的 CLR 函数有关。删除 CLR 并使用直接的 TSQL 过程后,性能提高了 20 倍。

在测试过程中,我还确定聚集索引和标识列对插入时间几乎没有影响,至少相对于发生的任何其他处理而言是这样。

// updating 10k rows in a table with trigger

// using CLR function
PK (identity, clustered)- ~78000ms
No PK, no index - ~81000ms

// using straight TSQL
PK (identity, clustered) - 2174ms
No PK, no index - 2102ms

We've been experiencing SQL timeouts and have identified that bottleneck to be an audit table - all tables in our system contain insert, update and delete triggers which cause a new audit record.

This means that the audit table is the largest and busiest table in the system. Yet data only goes in, and never comes out (under this system) so no select performance is required.

Running a select top 10 returns recently insert records rather than the 'first' records. order by works, of course, but I would expect that a select top should return rows based on their order on the disc - which I'd expect would return the lowest PK values.

It's been suggested that we drop the clustered index, and in fact the primary key (unique constraint) as well. As I mentioned earlier there's no need to select from this table within this system.

What sort of performance hit does a clustered index create on a table? What are the (non-select) ramifications of having an unindexed, unclustered, key-less table? Any other suggestions?

edit

our auditing involves CLR functions and I am now benchmarking with & without PK, indexes, FKs etc to determine the relative cost of the CLR functions & the contraints.

After investigation, the poor performance was not related to the insert statements but instead the CLR function which orchestrated the auditing. After removing the CLR and instead using a straight TSQL proc, performance improved 20-fold.

During the testing I've also determined that the clustered index and identity columns make little or no difference to the insert time, at least relative to any other processing that takes place.

// updating 10k rows in a table with trigger

// using CLR function
PK (identity, clustered)- ~78000ms
No PK, no index - ~81000ms

// using straight TSQL
PK (identity, clustered) - 2174ms
No PK, no index - 2102ms

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

明月夜 2024-12-09 09:37:37

根据索引女王 Kimberly Tripp 的说法,在表上拥有聚集索引实际上有助于提高 INSERT 性能:

聚集索引争论仍在继续

  • 在集群表中插入速度更快(但仅限于“右侧”)
    聚簇表)与堆相比。这里的主要问题是
    在 IAM/PFS 中查找以确定堆中的插入位置
    比聚簇表慢(其中插入位置已知,
    由聚集键定义)。插入时插入速度更快
    定义顺序 (CL) 以及该顺序所在的表
    不断增加。

资料来源:名为 聚集索引争论仍在继续的博客文章... .

According to Kimberly Tripp - the Queen of Indexing - having a clustered index on a table actually helps INSERT performance:

The Clustered Index Debate Continued

  • Inserts are faster in a clustered table (but only in the "right"
    clustered table) than compared to a heap. The primary problem here is
    that lookups in the IAM/PFS to determine the insert location in a heap
    are slower than in a clustered table (where insert location is known,
    defined by the clustered key). Inserts are faster when inserted into a
    table where order is defined (CL) and where that order is
    ever-increasing.

Source: blog post called The Clustered Index Debate Continues....

巷子口的你 2024-12-09 09:37:37

Tibor Karaszi 的博客上提供了有关此场景的出色测试脚本和描述,网址为 SQLblog.com

我的数字与他的不完全匹配 - 我看到更多差异在批处理语句上比在每行语句上更容易。

当行数约为 100 万时,我相当一致地在聚集索引上执行单行插入循环,其执行速度略快于非索引索引(聚集索引的插入时间约为非索引索引的 97%)。

相反,批量插入(10000 行)到非索引而不是聚集索引(聚集插入时间的 75%-85%)更快。

clustered - loop        - 1689
heap      - loop        - 1713
clustered - one statement - 85
heap      - one statement - 62

他描述了每个插入中发生的情况:

堆: SQL Server 需要找到行应该存放的位置。为此它
使用一个或多个 IAM 页作为堆,并且交叉引用这些页
到数据库文件的一个或多个 PFS 页。国际海事组织,应该有
这里可能会产生显着的开销。甚至更多,与许多
用户敲击同一张桌子我可以想象阻塞(等待)
PFS 可能还有 IAM 页面。

聚集表:现在,这非常简单。 SQL Server 导航
聚集索引树并找到该行应该去的位置。既然这是
不断增加的索引键,每一行都会到达表的末尾
(链接列表)。

A great test script and description of this scenarion is available on Tibor Karaszi's blog at SQLblog.com

My numbers don't entirely match his - I see more difference on a batch statement than I do with per-row statements.

With the row count around one million I fairly consistently get a single-row insert loop on clustered index to perform slightly faster than on a non-indexed (clustered taking approximately 97% as long as non-indexed).

Conversely the batch insert (10000 rows) is faster into a non-indexed rather than clustered index (anything from 75%-85% of the clustered insert time).

clustered - loop        - 1689
heap      - loop        - 1713
clustered - one statement - 85
heap      - one statement - 62

He describes what's happening on each insert:

Heap: SQL Server need to find where the row should go. For this it
uses one or more IAM pages for the heap, and it cross references these
to one or more PFS pages for the database file(s). IMO, there should
be potential for a noticable overhead here. And even more, with many
users hammering the same table I can imagine blocking (waits) against
the PFS and possibly also IAM pages.

Clustered table: Now, this is dead simple. SQL server navigates the
clustered index tree and find where the row should go. Since this is
an ever increasing index key, each row will go to the end of the table
(linked list).

李不 2024-12-09 09:37:37

没有钥匙的桌子?甚至没有自动递增代理键? :(

只要键单调递增,插入时的索引维护应该很好——它只是“添加在末尾”。“聚集”只是意味着表的物理布局遵循索引(因为数据是索引的一部分)。只要索引不碎片化(参见单调递增位),那么集群本身/数据就不会在逻辑上碎片化,这应该是'不是性能问题。 (如果有更新,那么集群的情况会略有不同:更新的记录可能会“增长”并导致碎片。)

我的建议是,如果这是所选的路线,那么...对其进行基准测试现实的数据/负载,然后决定这样的建议是否合理。很高兴看到这一更改已被决定,以及为什么

快乐编码。


此外,除了 ORDER BY 设计上有缺陷,现在可能可以工作,但这是一个实现细节,可能会以微妙的方式发生变化(就像不同的查询计划一样简单)。使用自动增量键,ORDER BY DESC 将始终产生正确的结果(请记住,可以跳过自动增量 ID,但除非“重置”,否则它们将始终基于插入而增加命令)。

A table without a key? Not even an auto-incrementing surrogate key? :(

As long as the key is monotonically increasing the index maintenance upon insert should be good -- it's just "added at the end". The "clustered" just means the physical layout of the table follows the index (as the data is part of the index). As long as the index isn't fragmented (see monotonically increasing bit) then the cluster itself/data won't be logically fragmented and this shouldn't be a performance issue. (If there are updates then the clustering is a slightly different story: the record updated may "grow" and cause fragmentation.)

My suggestion is, if that is the chosen route then ... benchmark it with realistic data/load and then decide if such suggestions are warranted. It would be nice to see is this change was decided upon, and why.

Happy coding.


Also, any reliance upon order excepting that from an ORDER BY is flawed by design. It may work now, but it is an implementation detail and may change in subtle ways (as simple as a different query plan). With the auto-increment key, an ORDER BY DESC would always produce the correct result (bear in mind that auto-incremeent IDs can be skipped, but unless "reset" they will always be increasing based on insert order).

偏爱你一生 2024-12-09 09:37:37

我的原始理解是,即使使用聚簇索引进行 INSERT 操作通常也会比使用堆更快。此外,聚集索引的磁盘空间要求较低。

一些有趣的测试/场景可能会对您的特定情况有所启​​发:http:// technet.microsoft.com/en-us/library/cc917672.aspx

My primitive understanding is that even INSERT operations are usually faster with a clustered index, than with a heap. Additionally, disk-space requirements are lower with clustered indexes.

Some interesting tests / scenarios that might shed some light for your particular circumstance: http://technet.microsoft.com/en-us/library/cc917672.aspx.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文