我在访问我的一张表时遇到了巨大的缓慢,我需要一些重构建议。很抱歉,如果这不是此类事情的正确区域。
我正在开展一个项目,旨在报告我们内部服务器的服务器性能统计数据。我每天晚上都会处理 Windows 性能日志(12 个服务器、10 个性能计数器,每 15 秒记录一次)。我将数据存储在表中,如下所示:
CREATE TABLE [dbo].[log](
[id] [int] IDENTITY(1,1) NOT NULL,
[logfile_id] [int] NOT NULL,
[test_id] [int] NOT NULL,
[timestamp] [datetime] NOT NULL,
[value] [float] NOT NULL,
CONSTRAINT [PK_log] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]
当前有 16,529,131 行,并且它将继续增长。
我访问数据以生成报告并从 Coldfusion 创建图表,如下所示:(
SET NOCOUNT ON
CREATE TABLE ##RowNumber ( RowNumber int IDENTITY (1, 1), log_id char(9) )
INSERT ##RowNumber (log_id)
SELECT l.id
FROM log l, logfile lf
WHERE lf.server_id = #arguments.server_id#
and l.test_id = #arguments.test_id#"
and l.timestamp >= #arguments.report_from#
and l.timestamp < #arguments.report_to#
and l.logfile_id = lf.id
order by l.timestamp asc
select rn.RowNumber, l.value, l.timestamp
from log l, logfile lf, ##RowNumber rn
where lf.server_id = #arguments.server_id#
and l.test_id = #arguments.test_id#
and l.logfile_id = lf.id
and rn.log_id = l.id
and ((rn.rownumber % #modu# = 0) or (rn.rownumber = 1))
order by l.timestamp asc
DROP TABLE ##RowNumber
SET NOCOUNT OFF
对于非 CF 开发者 #value#
插入 value
和 ##
映射到 #
)
我基本上创建了一个临时表,以便我可以使用行号来选择每 x 行。通过这种方式,我只选择可以显示的行数。这有帮助,但仍然很慢。
SQL Server Management Studio 告诉我我的索引如下(我几乎不知道如何正确使用索引):
IX_logfile_id (Non-Unique, Non-Clustered)
IX_test_id (Non-Unique, Non-Clustered)
IX_timestamp (Non-Unique, Non-Clustered)
PK_log (Clustered)
我将非常感谢任何可以提供一些建议来帮助我加快速度的人。我不介意重新组织事物,并且我可以完全控制该项目(尽管可能不控制服务器硬件)。
干杯(抱歉帖子太长)
I'm experiencing massive slowness when accessing one of my tables and I need some re-factoring advice. Sorry if this is not the correct area for this sort of thing.
I'm working on a project that aims to report on server performance statistics for our internal servers. I'm processing windows performance logs every night (12 servers, 10 performance counters and logging every 15 seconds). I'm storing the data in a table as follows:
CREATE TABLE [dbo].[log](
[id] [int] IDENTITY(1,1) NOT NULL,
[logfile_id] [int] NOT NULL,
[test_id] [int] NOT NULL,
[timestamp] [datetime] NOT NULL,
[value] [float] NOT NULL,
CONSTRAINT [PK_log] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]
There's currently 16,529,131 rows and it will keep on growing.
I access the data to produce reports and create graphs from coldfusion like so:
SET NOCOUNT ON
CREATE TABLE ##RowNumber ( RowNumber int IDENTITY (1, 1), log_id char(9) )
INSERT ##RowNumber (log_id)
SELECT l.id
FROM log l, logfile lf
WHERE lf.server_id = #arguments.server_id#
and l.test_id = #arguments.test_id#"
and l.timestamp >= #arguments.report_from#
and l.timestamp < #arguments.report_to#
and l.logfile_id = lf.id
order by l.timestamp asc
select rn.RowNumber, l.value, l.timestamp
from log l, logfile lf, ##RowNumber rn
where lf.server_id = #arguments.server_id#
and l.test_id = #arguments.test_id#
and l.logfile_id = lf.id
and rn.log_id = l.id
and ((rn.rownumber % #modu# = 0) or (rn.rownumber = 1))
order by l.timestamp asc
DROP TABLE ##RowNumber
SET NOCOUNT OFF
(for not CF devs #value#
inserts value
and ##
maps to #
)
I basically create a temporary table so that I can use the rownumber to select every x rows. In this way I'm only selecting the amount of rows I can display. this helps but it's still very slow.
SQL Server Management Studio tells me my index's are as follows (I have pretty much no knowledge about using index's properly):
IX_logfile_id (Non-Unique, Non-Clustered)
IX_test_id (Non-Unique, Non-Clustered)
IX_timestamp (Non-Unique, Non-Clustered)
PK_log (Clustered)
I would be very grateful to anyone who could give some advice that could help me speed things up a bit. I don't mind re-organising things and I have complete control of the project (perhaps not over the server hardware though).
Cheers (sorry for the long post)
发布评论
评论(3)
您的问题是您选择了错误的聚集键。没有人对通过 ID 检索某一特定日志值感兴趣。如果您的系统与我见过的任何其他系统一样,那么所有查询都会要求:
考虑到表的大小,所有非聚集索引都是无用的。他们都将达到指数临界点,保证,所以它们可能不存在。我假设您的所有非聚集索引都被定义为名称中字段的简单索引,没有包含字段。
我会假装我真的知道你的要求。您必须忘记有关存储的常识,并实际上在每个非聚集索引中复制所有数据。这是我的建议:
聚集索引涵盖查询“特定机器上特定计数器值的历史记录”。非聚集索引涵盖了各种其他可能的查询(随着时间的推移,机器上的所有计数器、随着时间的推移跨所有机器的特定计数器等)。
您注意到我没有对您的查询脚本发表任何评论。这是因为您无法采取任何措施来使查询在您拥有的表结构上运行得更快。
现在你不应该做的一件事就是实际执行我的建议。我说过我要假装我知道您的要求。但我其实不这么认为。我只是举了一个可能的结构的例子。您真正应该做的是研究该主题并找出适合您要求的正确索引结构:
此外,谷歌搜索“覆盖索引”会带来很多好文章。
当然,归根结底,存储并不是免费的,因此您必须在每种可能的组合上拥有非聚集索引的要求与控制数据库大小的需要之间取得平衡。幸运的是,您有一个非常小且狭窄的表,因此在许多非聚集索引上复制它并不是什么大问题。另外,我不会担心插入性能,15 秒内有 120 个计数器,每个计数器意味着每秒 8-9 次插入,这没什么。
Your problem is that you chose a bad clustered key. Nobody is ever interested in retrieving one particular log value by ID. I your system is like anything else I've seen, then all queries are going to ask for:
Given the size of the table, all your non-clustered indexes are useless. They are all going to hit the index tipping point, guaranteed, so they might just as well not exists. I assume all your non-clustered index are defined as a simple index over the field in the name, with no include fields.
I'm going to pretend I actually know your requirements. You must forget common sense about storage and actually duplicate all your data in every non-clustered index. Here is my advice:
The clustered index covers the query 'history of specific counter value at a specific machine'. The non clustered indexes cover various other possible queries (all counters at a machine over time, specific counter across all machines over time etc).
You notice I did not comment anything about your query script. That is because there isn't anything in the world you can do to make the queries run faster over the table structure you have.
Now one thing you shouldn't do is actually implement my advice. I said I'm going to pretend I know your requirements. But I actually don't. I just gave an example of a possible structure. What you really should do is study the topic and figure out the correct index structure for your requirements:
Also a google on 'covering index' will bring up a lot of good articles.
And of course, at the end of the day storage is not free so you'll have to balance the requirement to have a non-clustered index on every possible combination with the need to keep the size of the database in check. Luckly you have a very small and narrow table, so duplicating it over many non-clustered index is no big deal. Also I wouldn't be concerned about insert performance, 120 counters at 15 seconds each means 8-9 inserts per second, which is nothing.
我想到了几件事。
您需要保留那么多数据吗?如果没有,如果您想保留它,请考虑创建一个存档表(但不要创建它只是为了每次运行查询时将其与主表连接)。
我会避免使用包含如此多数据的临时表。请参阅这篇有关临时表性能以及如何避免使用它们的文章。
http://www.sql-server-performance.com/articles/ per/衍生_temp_tables_p1.aspx
http://www.sql-server-performance.com/tips/covering_indexes_p1。 aspx
编辑
请参阅此链接了解信息 http://technet.microsoft.com/en-us /library/cc966523.aspx
另外,请注意,我已将这些项目编号为 1,2,3,4,但编辑器会自动重置它们
A couple things come to mind.
Do you need to keep that much data? If not, consider either creating an archive table if you want to keep it (but don't create it just to join it with the primary table every time you run a query).
I would avoid using a temp table with so much data. See this article on temp table performance and how to avoid using them.
http://www.sql-server-performance.com/articles/per/derived_temp_tables_p1.aspx
http://www.sql-server-performance.com/tips/covering_indexes_p1.aspx
Edit
DBCC SHOWCONTIG
command.See this link for info http://technet.microsoft.com/en-us/library/cc966523.aspx
Also, please note that I have numbered these items as 1,2,3,4 however the editor is automatically resetting them
有一次,当我还在使用sql server 2000时,我需要做一些分页,我遇到了一种真正让我大吃一惊的分页方法。看看这个方法。
举个例子
Once when still working with sql server 2000, i needed to do some paging, and i came accross a method of paging that realy blew my mind. Have a look at this method.
As an example