第一个名为heap 的表在字段group 上获得了非聚集索引。第二个表名为 clust,在名为 key 的顺序字段上有一个聚集索引,在字段 group 上有一个非聚集索引。
测试在I5 M540 处理器,具有 2 个超线程核心、4Gb 内存和 64 位 Windows 7。
Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64)
Apr 2 2010 15:48:46
Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)
int[] idList = new int[] { 6816588, 7086702, 6498815 ... }; // 1000 values here.
using (var conn = new SqlConnection(@"Data Source=myserver;Initial Catalog=mydb;Integrated Security=SSPI;"))
{
conn.Open();
using (var cmd = new SqlCommand())
{
cmd.Connection = conn;
cmd.CommandType = CommandType.Text;
cmd.CommandText = "select * from heap where common_key between @id and @id+1000";
cmd.Parameters.Add("@id", SqlDbType.Int);
cmd.Prepare();
foreach (int id in idList)
{
cmd.Parameters[0].Value = id;
using (var reader = cmd.ExecuteReader())
{
int count = 0;
while (reader.Read())
{
count++;
}
Console.WriteLine(String.Format("key: {0} => {1} rows", id, count));
}
}
}
}
更新于 2011 年 3 月 9 日结束。
SELECT 性能
为了检查性能数字,我对堆表和簇表执行了一次以下查询:
select * from heap/clust where group between 5678910 and 5679410
select * from heap/clust where group between 6234567 and 6234967
select * from heap/clust where group between 6455429 and 6455729
select * from heap/clust where group between 6655429 and 6655729
select * from heap/clust where group between 6955429 and 6955729
select * from heap/clust where group between 7195542 and 7155729
update heap/clust set amount = amount + 0 where group between 5678910 and 5679410
update heap/clust set amount = amount + 0 where group between 6234567 and 6234967
update heap/clust set amount = amount + 0 where group between 6455429 and 6455729
update heap/clust set amount = amount + 0 where group between 6655429 and 6655729
update heap/clust set amount = amount + 0 where group between 6955429 and 6955729
update heap/clust set amount = amount + 0 where group between 7195542 and 7155729
delete heap/clust where group between 5678910 and 5679410
delete heap/clust where group between 6234567 and 6234967
delete heap/clust where group between 6455429 and 6455729
delete heap/clust where group between 6655429 and 6655729
delete heap/clust where group between 6955429 and 6955729
delete heap/clust where group between 7195542 and 7155729
To check your request I created 2 tables following this scheme:
7.9 million records representing balance information.
an identity field counting from 1 to 7.9 million
a number field grouping the records in about 500k groups.
The first table called heap got a non clustered index on the field group. The second table called clust got a clustered index on the sequential field called key and a nonclustered index on the field group
The tests were run on an I5 M540 processor with 2 hyperthreaded cores, 4Gb memory and 64-bit windows 7.
Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64)
Apr 2 2010 15:48:46
Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)
Update on 9 Mar 2011: I did a second more extensive benchmark by running the following .net code and logging Duration, CPU, Reads, Writes and RowCounts in Sql Server Profiler. (The CommandText used will be mentioned in the results.)
NOTE: CPU and Duration are expressed in milliseconds
1000 queries
zero CPU queries are eliminated from the results
0 rows affected are eliminated from the results
int[] idList = new int[] { 6816588, 7086702, 6498815 ... }; // 1000 values here.
using (var conn = new SqlConnection(@"Data Source=myserver;Initial Catalog=mydb;Integrated Security=SSPI;"))
{
conn.Open();
using (var cmd = new SqlCommand())
{
cmd.Connection = conn;
cmd.CommandType = CommandType.Text;
cmd.CommandText = "select * from heap where common_key between @id and @id+1000";
cmd.Parameters.Add("@id", SqlDbType.Int);
cmd.Prepare();
foreach (int id in idList)
{
cmd.Parameters[0].Value = id;
using (var reader = cmd.ExecuteReader())
{
int count = 0;
while (reader.Read())
{
count++;
}
Console.WriteLine(String.Format("key: {0} => {1} rows", id, count));
}
}
}
}
End of Update on 9 Mar 2011.
SELECT performance
To check performanc numbers I performed the following queries once on the heap table and once on the clust table:
select * from heap/clust where group between 5678910 and 5679410
select * from heap/clust where group between 6234567 and 6234967
select * from heap/clust where group between 6455429 and 6455729
select * from heap/clust where group between 6655429 and 6655729
select * from heap/clust where group between 6955429 and 6955729
select * from heap/clust where group between 7195542 and 7155729
The second batch of queries are update statements:
update heap/clust set amount = amount + 0 where group between 5678910 and 5679410
update heap/clust set amount = amount + 0 where group between 6234567 and 6234967
update heap/clust set amount = amount + 0 where group between 6455429 and 6455729
update heap/clust set amount = amount + 0 where group between 6655429 and 6655729
update heap/clust set amount = amount + 0 where group between 6955429 and 6955729
update heap/clust set amount = amount + 0 where group between 7195542 and 7155729
the third batch of queries I ran are delete statements
delete heap/clust where group between 5678910 and 5679410
delete heap/clust where group between 6234567 and 6234967
delete heap/clust where group between 6455429 and 6455729
delete heap/clust where group between 6655429 and 6655729
delete heap/clust where group between 6955429 and 6955729
delete heap/clust where group between 7195542 and 7155729
Although there are more logical reads going on when accessing the table with the clustered & the nonclustered index (while using the nonclustered index) the performance results are:
SELECT statements are comparable
UPDATE statements are faster with a clustered index in place
DELETE statements are faster with a clustered index in place
INSERT statements are faster with a clustered index in place
Of course my benchmark was very limited on a specific kind of table and with a very limited set of queries, but I think that based on this information we can already start saying that it is virtually always better to create a clustered index on your table.
Update on 9 Mar 2011:
As we can see from the added results, the conclusions on the limited tests were not correct in every case.
The results now indicate that the only statements which benefit from the clustered index are the update statements. The other statements are about 30% slower on the table with clustered index.
Some additional charts where I plotted the weighted duration per query for heap vs clust.
As you can see the performance profile for the insert statements is quite interesting. The spikes are caused by a few data points which take a lot longer to complete.
As Kimberly Tripp - the Queen of Indexing - explains quite nicely in her blog post The Clustered Index Debate continues..., having a clustering key on a database table pretty much speeds up all operations - not just SELECT.
SELECT are generally slower on a heap as compared to a clustered table, as long as you pick a good clustering key - something like an INT IDENTITY. If you use a really really bad clustering key, like a GUID or a compound key with lots of variable length components, then, but only then, a heap might be faster. But in that case, you really need to clean up your database design in the first place...
So in general, I don't think there's any point in a heap - pick a good, useful clustering key and you should benefit in all respects.
发布评论
评论(2)
为了检查您的请求,我按照此方案创建了 2 个表:
第一个名为
heap
的表在字段group
上获得了非聚集索引。第二个表名为clust
,在名为key
的顺序字段上有一个聚集索引,在字段group
上有一个非聚集索引。测试在I5 M540 处理器,具有 2 个超线程核心、4Gb 内存和 64 位 Windows 7。
SELECT 性能
为了检查性能数字,我对堆表和簇表执行了一次以下查询:
此基准测试的结果针对
堆
:对于表
clust
,结果是:SELECT WITH JOIN 性能
cmd.CommandText = "select * from heap/clust h join keys k on h.group = k.group where h.group between @id and @id+1000";
结果该基准测试的
堆
:873行有> 0 个 CPU 并影响超过 0 行
此基准测试的结果针对
clust
:865 行有 > 0 CPU 并影响超过 0 行的
UPDATE 性能
第二批查询是更新语句:
此基准测试的结果为
heap
:clust
的基准测试结果:DELETE 基准测试
我运行的第三批查询是删除语句
此基准测试的
堆
结果:clust
的基准测试结果:INSERT 基准测试
基准测试的最后一部分是插入语句的执行。
插入堆/簇(...)
价值观(...),
(...),
(...),
(...),
(...),
(...)
堆
的基准测试结果:clust
的基准测试结果:结论
虽然使用 clustered & 访问表时会发生更多的逻辑读取,但非聚集索引(使用非聚集索引时)的性能结果是:
INSERT 当然,我的基准测试非常限于特定类型的表和非常有限的查询集,但我认为基于这些信息,我们已经可以开始说,在表上创建聚集索引实际上总是更好。
从补充的结果可以看出,有限测试的结论并非在所有情况下都是正确的。
现在的结果表明,唯一受益于聚集索引的语句是更新语句。在具有聚集索引的表上,其他语句大约慢 30%。
我在一些附加图表中绘制了堆与簇的每个查询的加权持续时间。
如您所见,插入语句的性能概况非常有趣。峰值是由一些数据点引起的,这些数据点需要更长的时间才能完成。
To check your request I created 2 tables following this scheme:
The first table called
heap
got a non clustered index on the fieldgroup
. The second table calledclust
got a clustered index on the sequential field calledkey
and a nonclustered index on the fieldgroup
The tests were run on an I5 M540 processor with 2 hyperthreaded cores, 4Gb memory and 64-bit windows 7.
SELECT performance
To check performanc numbers I performed the following queries once on the heap table and once on the clust table:
The results of this benchmark are for the
heap
:for the table
clust
the results are:SELECT WITH JOIN performance
cmd.CommandText = "select * from heap/clust h join keys k on h.group = k.group where h.group between @id and @id+1000";
The results of this benchmark are for the
heap
:873 Rows have > 0 CPU and affect more than 0 rows
The results of this benchmark are for the
clust
:865 Rows have > 0 CPU and affect more than 0 rows
UPDATE performance
The second batch of queries are update statements:
the results of this benchmark for the
heap
:the results of this benchmark for the
clust
:DELETE benchmarks
the third batch of queries I ran are delete statements
The result of this benchmark for the
heap
:the result of this benchmark for the
clust
:INSERT benchmarks
The last part of the benchmark is the execution of insert statements.
insert into heap/clust (...)
values (...),
(...),
(...),
(...),
(...),
(...)
The result of this benchmark for the
heap
:The result of this benchmark for the
clust
:Conclusions
Although there are more logical reads going on when accessing the table with the clustered & the nonclustered index (while using the nonclustered index) the performance results are:
Of course my benchmark was very limited on a specific kind of table and with a very limited set of queries, but I think that based on this information we can already start saying that it is virtually always better to create a clustered index on your table.
As we can see from the added results, the conclusions on the limited tests were not correct in every case.
The results now indicate that the only statements which benefit from the clustered index are the update statements. The other statements are about 30% slower on the table with clustered index.
Some additional charts where I plotted the weighted duration per query for heap vs clust.
As you can see the performance profile for the insert statements is quite interesting. The spikes are caused by a few data points which take a lot longer to complete.
正如索引女王 Kimberly Tripp 在她的博客文章集群中很好地解释的那样索引争论仍在继续...,数据库表上有一个集群键几乎可以加快所有操作的速度,而不仅仅是
SELECT
。与聚簇表相比,SELECT 在堆上的速度通常较慢,只要您选择一个好聚簇键 - 类似于
INT IDENTITY
。如果您使用非常糟糕的集群键,例如 GUID 或具有大量可变长度组件的复合键,那么,但只有这样,堆可能会更快。但在这种情况下,你确实需要首先清理你的数据库设计......所以总的来说,我认为堆没有任何意义 - 选择一个好的、有用的集群键,你应该从所有方面受益尊重。
As Kimberly Tripp - the Queen of Indexing - explains quite nicely in her blog post The Clustered Index Debate continues..., having a clustering key on a database table pretty much speeds up all operations - not just
SELECT
.SELECT are generally slower on a heap as compared to a clustered table, as long as you pick a good clustering key - something like an
INT IDENTITY
. If you use a really really bad clustering key, like a GUID or a compound key with lots of variable length components, then, but only then, a heap might be faster. But in that case, you really need to clean up your database design in the first place...So in general, I don't think there's any point in a heap - pick a good, useful clustering key and you should benefit in all respects.