当前位置：文江博客话题详情

coldfusion sql-server sql-server-2000

大表建议 (SQL Server)

发布于 2024-08-11 10:01:32 字数 1797 浏览 7 评论 0 原文

我在访问我的一张表时遇到了巨大的缓慢，我需要一些重构建议。很抱歉，如果这不是此类事情的正确区域。

我正在开展一个项目，旨在报告我们内部服务器的服务器性能统计数据。我每天晚上都会处理 Windows 性能日志（12 个服务器、10 个性能计数器，每 15 秒记录一次）。我将数据存储在表中，如下所示：

CREATE TABLE [dbo].[log](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [logfile_id] [int] NOT NULL,
    [test_id] [int] NOT NULL,
    [timestamp] [datetime] NOT NULL,
    [value] [float] NOT NULL,
CONSTRAINT [PK_log] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

当前有 16,529,131 行，并且它将继续增长。

我访问数据以生成报告并从 Coldfusion 创建图表，如下所示：（

SET NOCOUNT ON

CREATE TABLE ##RowNumber ( RowNumber int IDENTITY (1, 1), log_id char(9) )

INSERT ##RowNumber (log_id)
SELECT l.id
FROM log l, logfile lf
WHERE lf.server_id = #arguments.server_id#
and l.test_id = #arguments.test_id#"
and l.timestamp >= #arguments.report_from#
and l.timestamp < #arguments.report_to#
and l.logfile_id = lf.id
order by l.timestamp asc

select rn.RowNumber, l.value, l.timestamp
from log l, logfile lf, ##RowNumber rn
where lf.server_id = #arguments.server_id#
and l.test_id = #arguments.test_id#
and l.logfile_id = lf.id
and rn.log_id = l.id
and ((rn.rownumber % #modu# = 0) or (rn.rownumber = 1)) 
order by l.timestamp asc

DROP TABLE ##RowNumber

SET NOCOUNT OFF

对于非 CF 开发者 #value# 插入 value 和 ## 映射到 #）

我基本上创建了一个临时表，以便我可以使用行号来选择每 x 行。通过这种方式，我只选择可以显示的行数。这有帮助，但仍然很慢。

SQL Server Management Studio 告诉我我的索引如下（我几乎不知道如何正确使用索引）：

IX_logfile_id (Non-Unique, Non-Clustered)
IX_test_id (Non-Unique, Non-Clustered)
IX_timestamp (Non-Unique, Non-Clustered)
PK_log (Clustered)

我将非常感谢任何可以提供一些建议来帮助我加快速度的人。我不介意重新组织事物，并且我可以完全控制该项目（尽管可能不控制服务器硬件）。

干杯（抱歉帖子太长）

原文

I'm experiencing massive slowness when accessing one of my tables and I need some re-factoring advice. Sorry if this is not the correct area for this sort of thing.

I'm working on a project that aims to report on server performance statistics for our internal servers. I'm processing windows performance logs every night (12 servers, 10 performance counters and logging every 15 seconds). I'm storing the data in a table as follows:

CREATE TABLE [dbo].[log](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [logfile_id] [int] NOT NULL,
    [test_id] [int] NOT NULL,
    [timestamp] [datetime] NOT NULL,
    [value] [float] NOT NULL,
CONSTRAINT [PK_log] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

There's currently 16,529,131 rows and it will keep on growing.

I access the data to produce reports and create graphs from coldfusion like so:

SET NOCOUNT ON

CREATE TABLE ##RowNumber ( RowNumber int IDENTITY (1, 1), log_id char(9) )

INSERT ##RowNumber (log_id)
SELECT l.id
FROM log l, logfile lf
WHERE lf.server_id = #arguments.server_id#
and l.test_id = #arguments.test_id#"
and l.timestamp >= #arguments.report_from#
and l.timestamp < #arguments.report_to#
and l.logfile_id = lf.id
order by l.timestamp asc

select rn.RowNumber, l.value, l.timestamp
from log l, logfile lf, ##RowNumber rn
where lf.server_id = #arguments.server_id#
and l.test_id = #arguments.test_id#
and l.logfile_id = lf.id
and rn.log_id = l.id
and ((rn.rownumber % #modu# = 0) or (rn.rownumber = 1)) 
order by l.timestamp asc

DROP TABLE ##RowNumber

SET NOCOUNT OFF

(for not CF devs #value# inserts value and ## maps to #)

I basically create a temporary table so that I can use the rownumber to select every x rows. In this way I'm only selecting the amount of rows I can display. this helps but it's still very slow.

SQL Server Management Studio tells me my index's are as follows (I have pretty much no knowledge about using index's properly):

IX_logfile_id (Non-Unique, Non-Clustered)
IX_test_id (Non-Unique, Non-Clustered)
IX_timestamp (Non-Unique, Non-Clustered)
PK_log (Clustered)

I would be very grateful to anyone who could give some advice that could help me speed things up a bit. I don't mind re-organising things and I have complete control of the project (perhaps not over the server hardware though).

Cheers (sorry for the long post)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鸵鸟症 2024-08-18 10:01:32

您的问题是您选择了错误的聚集键。没有人对通过 ID 检索某一特定日志值感兴趣。如果您的系统与我见过的任何其他系统一样，那么所有查询都会要求：

某个日期范围内所有服务器的所有计数器某个日期
范围内所有服务器的特定计数器值某个范围
内一台服务器的所有计数器日期
范围内特定服务器的特定计数器

考虑到表的大小，所有非聚集索引都是无用的。他们都将达到指数临界点，保证，所以它们可能不存在。我假设您的所有非聚集索引都被定义为名称中字段的简单索引，没有包含字段。

我会假装我真的知道你的要求。您必须忘记有关存储的常识，并实际上在每个非聚集索引中复制所有数据。这是我的建议：

删除 [id] 上的聚集索引，这是无用的。
使用聚集索引（logfile_it、test_id、时间戳）组织表。
(test_id, logfile_id, timestamp) 上的非聚集索引包含（值）
（logfile_id, 时间戳）上的 NC 索引包含（值）
（test_id, 时间戳）上的 NC 索引包含（值）
（时间戳）上的 NC 索引包含（值）
添加维护任务以定期重新组织所有索引，因为它们容易产生碎片。

聚集索引涵盖查询“特定机器上特定计数器值的历史记录”。非聚集索引涵盖了各种其他可能的查询（随着时间的推移，机器上的所有计数器、随着时间的推移跨所有机器的特定计数器等）。

您注意到我没有对您的查询脚本发表任何评论。这是因为您无法采取任何措施来使查询在您拥有的表结构上运行得更快。

现在你不应该做的一件事就是实际执行我的建议。我说过我要假装我知道您的要求。但我其实不这么认为。我只是举了一个可能的结构的例子。您真正应该做的是研究该主题并找出适合您要求的正确索引结构：

此外，谷歌搜索“覆盖索引”会带来很多好文章。

当然，归根结底，存储并不是免费的，因此您必须在每种可能的组合上拥有非聚集索引的要求与控制数据库大小的需要之间取得平衡。幸运的是，您有一个非常小且狭窄的表，因此在许多非聚集索引上复制它并不是什么大问题。另外，我不会担心插入性能，15 秒内有 120 个计数器，每个计数器意味着每秒 8-9 次插入，这没什么。

Your problem is that you chose a bad clustered key. Nobody is ever interested in retrieving one particular log value by ID. I your system is like anything else I've seen, then all queries are going to ask for:

all counters for all servers over a range of dates
specific counter values over all servers for a range of dates
all counters for one server over a range of dates
specific counter for specific server over a range of dates

Given the size of the table, all your non-clustered indexes are useless. They are all going to hit the index tipping point, guaranteed, so they might just as well not exists. I assume all your non-clustered index are defined as a simple index over the field in the name, with no include fields.

I'm going to pretend I actually know your requirements. You must forget common sense about storage and actually duplicate all your data in every non-clustered index. Here is my advice:

Drop the clustered index on [id], is a as useless as is it gets.
Organize the table with a clustered index (logfile_it, test_id, timestamp).
Non-clusterd index on (test_id, logfile_id, timestamp) include (value)
NC index on (logfile_id, timestamp) include (value)
NC index on (test_id, timestamp) include (value)
NC index on (timestamp) include (value)
Add maintenance tasks to reorganize all indexes periodically as they are prone to fragmentation

The clustered index covers the query 'history of specific counter value at a specific machine'. The non clustered indexes cover various other possible queries (all counters at a machine over time, specific counter across all machines over time etc).

You notice I did not comment anything about your query script. That is because there isn't anything in the world you can do to make the queries run faster over the table structure you have.

Now one thing you shouldn't do is actually implement my advice. I said I'm going to pretend I know your requirements. But I actually don't. I just gave an example of a possible structure. What you really should do is study the topic and figure out the correct index structure for your requirements:

Also a google on 'covering index' will bring up a lot of good articles.

And of course, at the end of the day storage is not free so you'll have to balance the requirement to have a non-clustered index on every possible combination with the need to keep the size of the database in check. Luckly you have a very small and narrow table, so duplicating it over many non-clustered index is no big deal. Also I wouldn't be concerned about insert performance, 120 counters at 15 seconds each means 8-9 inserts per second, which is nothing.

回复收藏 0 原文

紧拥背影 2024-08-18 10:01:32

我想到了几件事。

您需要保留那么多数据吗？如果没有，如果您想保留它，请考虑创建一个存档表（但不要创建它只是为了每次运行查询时将其与主表连接）。
我会避免使用包含如此多数据的临时表。请参阅这篇有关临时表性能以及如何避免使用它们的文章。

http://www.sql-server-performance.com/articles/ per/衍生_temp_tables_p1.aspx

看起来您缺少 server_id 字段上的索引。我会考虑使用此字段和其他字段创建覆盖索引。这里还有一篇关于这一点的文章。

http://www.sql-server-performance.com/tips/covering_indexes_p1。 aspx

编辑

在如此短的时间内表中有这么多行，我还会检查索引是否存在碎片，这可能是导致速度缓慢的原因。在 SQL Server 2000 中，您可以使用 DBCC SHOWCONTIG 命令。

请参阅此链接了解信息 http://technet.microsoft.com/en-us /library/cc966523.aspx

另外，请注意，我已将这些项目编号为 1,2,3,4，但编辑器会自动重置它们

回复收藏 0 原文

混浊又暗下来 2024-08-18 10:01:32

有一次，当我还在使用sql server 2000时，我需要做一些分页，我遇到了一种真正让我大吃一惊的分页方法。看看这个方法。

DECLARE @Table TABLE(
        TimeVal DATETIME
)

DECLARE @StartVal INT
DECLARE @EndVal INT

SELECT  @StartVal = 51, @EndVal = 100

SELECT  *
FROM    (
            SELECT  TOP (@EndVal - @StartVal + 1)
                    *
            FROM    (
                        --select up to end number
                        SELECT  TOP (@EndVal)
                                *
                        FROM    @Table
                        ORDER BY TimeVal ASC
                    ) PageReversed
            ORDER BY TimeVal DESC
        ) PageVals
ORDER BY TimeVal ASC

举个例子

SELECT  *
FROM    (
            SELECT  TOP (@EndVal - @StartVal + 1)
                    *
            FROM    (
                        SELECT TOP (@EndVal)
                                l.id,
                                l.timestamp
                        FROM log l, logfile lf
                        WHERE lf.server_id = #arguments.server_id#
                        and l.test_id = #arguments.test_id#"
                        and l.timestamp >= #arguments.report_from#
                        and l.timestamp < #arguments.report_to#
                        and l.logfile_id = lf.id
                        order by l.timestamp asc
                    ) PageReversed ORDER BY timestamp DESC
        ) PageVals
ORDER BY timestamp ASC

Once when still working with sql server 2000, i needed to do some paging, and i came accross a method of paging that realy blew my mind. Have a look at this method.

DECLARE @Table TABLE(
        TimeVal DATETIME
)

DECLARE @StartVal INT
DECLARE @EndVal INT

SELECT  @StartVal = 51, @EndVal = 100

SELECT  *
FROM    (
            SELECT  TOP (@EndVal - @StartVal + 1)
                    *
            FROM    (
                        --select up to end number
                        SELECT  TOP (@EndVal)
                                *
                        FROM    @Table
                        ORDER BY TimeVal ASC
                    ) PageReversed
            ORDER BY TimeVal DESC
        ) PageVals
ORDER BY TimeVal ASC

As an example

SELECT  *
FROM    (
            SELECT  TOP (@EndVal - @StartVal + 1)
                    *
            FROM    (
                        SELECT TOP (@EndVal)
                                l.id,
                                l.timestamp
                        FROM log l, logfile lf
                        WHERE lf.server_id = #arguments.server_id#
                        and l.test_id = #arguments.test_id#"
                        and l.timestamp >= #arguments.report_from#
                        and l.timestamp < #arguments.report_to#
                        and l.logfile_id = lf.id
                        order by l.timestamp asc
                    ) PageReversed ORDER BY timestamp DESC
        ) PageVals
ORDER BY timestamp ASC

回复收藏 0 原文

~没有更多了~