非索引数据最快的 SQL 查询

发布于 2024-10-01 14:48:48 字数 1419 浏览 1 评论 0原文

我正在针对 SQL Server 2005 数据库构建一些自定义报告。该数据库属于我们运行的第三方管理应用程序。我提取的数据不是该网站的主要目的,因此除了时间戳列之外,数据大部分都没有索引。目前,只涉及一张表——一张大约有 7 亿行的表。因此,当我对它运行一个只返回 50 行的查询时,它必须轮询所有 7 亿行。

我希望加快速度,但不想对添加到 WHERE 子句的每一列建立索引 - 我不知道添加这么多索引最终会大大提高速度(或者我是这样)错误的?)。因此,我很好奇如果我无法向表中添加任何新索引,最佳实践是什么!

存储过程似乎不是最合适的。索引视图可能是最好的主意?想法?

这是表架构:

DeviceGuid (PK, uniqueidentifier, not null)
DeviceID (int, not null)
WindowsEventID (PK, int, not null) (indexed)
EventLog (varchar(64), not null)
EventSource (varchar(64), not null)
EventID (int, not null)
Severity (int, not null)
Description (nvarchar(max), not null)
TimeOfEvent (PK, datetime, not null) (indexed)
OccurrenceNbr (int, not null)

这是一个示例查询:

SELECT COUNT(*) AS NumOcc, EventID, EventLog, EventSource, Severity, TimeOfEvent, Description
FROM WindowsEvent
WHERE DeviceID='34818'
    AND Severity=1
    AND TimeOfEvent >= DATEADD(hh, DATEDIFF(hh, GETDATE(), GETUTCDATE()), '2010/10/27 12:00:00 AM')
    AND TimeOfEvent <= DATEADD(hh, DATEDIFF(hh, GETDATE(), GETUTCDATE()), '2010/11/3 12:00:00 AM')
    AND EventID<>34113
    AND EventID<>34114
    AND EventID<>34112
    AND EventID<>57755
    AND EventSource<>'AutoImportSvc.exe'
    AND EventLog='Application'
GROUP BY EventID, EventLog, EventSource, Severity, Description
ORDER BY NumOcc DESC

也许查询很糟糕......它在 4.5 分钟内返回 53 行。

I'm building a few custom reports against a SQL Server 2005 database. The database belongs to a 3rd party management application we run. The data I'm pulling isn't among the primary purposes of the site, so the data is largely non-indexed except for the timestamp column. For now, there is only one table involved -- a table of some 700 million rows. So when I run a query against it that should return only 50 rows, it has to poll all 700mil.

I'm looking to speed this up, but don't want to index every column that I'm adding to the WHERE clause -- I don't know that adding that many indexes will end up improving the speed much (or am I wrong?). So I'm curious what the best practice would be if I couldn't add any new indexes to the table!

Stored procedure doesn't seem like the best fit. An indexed view might be the best idea? Thoughts?

Here's the table schema:

DeviceGuid (PK, uniqueidentifier, not null)
DeviceID (int, not null)
WindowsEventID (PK, int, not null) (indexed)
EventLog (varchar(64), not null)
EventSource (varchar(64), not null)
EventID (int, not null)
Severity (int, not null)
Description (nvarchar(max), not null)
TimeOfEvent (PK, datetime, not null) (indexed)
OccurrenceNbr (int, not null)

Here's a sample query:

SELECT COUNT(*) AS NumOcc, EventID, EventLog, EventSource, Severity, TimeOfEvent, Description
FROM WindowsEvent
WHERE DeviceID='34818'
    AND Severity=1
    AND TimeOfEvent >= DATEADD(hh, DATEDIFF(hh, GETDATE(), GETUTCDATE()), '2010/10/27 12:00:00 AM')
    AND TimeOfEvent <= DATEADD(hh, DATEDIFF(hh, GETDATE(), GETUTCDATE()), '2010/11/3 12:00:00 AM')
    AND EventID<>34113
    AND EventID<>34114
    AND EventID<>34112
    AND EventID<>57755
    AND EventSource<>'AutoImportSvc.exe'
    AND EventLog='Application'
GROUP BY EventID, EventLog, EventSource, Severity, Description
ORDER BY NumOcc DESC

Maybe the query sucks ... it returns 53 rows in 4.5 minutes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

定格我的天空 2024-10-08 14:48:48

这里的最终解决方案是针对索引字段运行查询,然后在运行查询的应用程序中过滤它们。两个字段最终包含足够相似的信息,我可以根据一个索引进行查询并获得与我想要的数据非常接近的近似值。我循环返回并从结果列表中删除了所有不匹配的实体。花费的时间少得多!

The final solution here was to run a query against the indexed fields, then filter them within the application running the query. Two fields ended up containing similar enough information that I could query against one index and get a very close approximation of the data I wanted. I looped back through and removed any non-matching entities from the result list. Took MUCH less time!

美煞众生 2024-10-08 14:48:48

如果您的查询不使用任何索引,那将非常糟糕。您不需要在每一列上都有索引,但您需要在右列上有一个索引。鉴于 TimeOfEvent 已被索引,它可能不太适合您的需求。

右列将取决于数据的分布。最好的索引可能是提供最高选择性的索引(即,当您知道索引的键值时,它会返回最少的行)。如果您知道提供最佳选择性的列,则可以尝试在其上建立索引。

为了帮助确定最佳索引,您可以使用 SSMS 中的显示估计执行计划。这将帮助您了解将使用哪个索引。添加索引后,您可以运行查询并使用执行计划评估结果。当然,观察经过的时间也会有帮助。

If your query isn't using any indexes it'll be real bad. You don't need an index on every column, but you'll want one on the right column. Given that TimeOfEvent is already indexed, it may not be a great one for your needs.

The right column is going to depend on the distribution of your data. The best index will probably be the index that provides highest selectivity (i.e., when you know a key value for the index it returns the fewest rows). If you know the column that provides the best selectivity, you can try the index on it.

To help determine the best index, you can use the Display Estimated Execution Plan in SSMS. This will help you see which index will be used. After adding an index, you can run your query and evaluate the results with the execution plan. And, of course, observing the elapsed time will help too.

始终不够 2024-10-08 14:48:48

使用双 row_number 技巧尝试此方法:

SELECT  RN_Desc as NumOcc, *
FROM    (
        SELECT  row_number() Over(partition by EventId order by EventLog, EventSource, Severity, Description) as RN_Asc,
                row_number() Over(partition by EventId order by EventLog desc, EventSource desc, Severity desc, Description desc) as RN_Desc,
                *
        FROM    WindowsEvent 
        WHERE   DeviceID='34818' 
                AND Severity=1 
                AND TimeOfEvent >= DATEADD(hh, DATEDIFF(hh, GETDATE(), GETUTCDATE()), '2010/10/27 12:00:00 AM') 
                AND TimeOfEvent <= DATEADD(hh, DATEDIFF(hh, GETDATE(), GETUTCDATE()), '2010/11/3 12:00:00 AM') 
                AND EventID<>34113 
                AND EventID<>34114 
                AND EventID<>34112 
                AND EventID<>57755 
                AND EventSource<>'AutoImportSvc.exe' 
                AND EventLog='Application' 
        ) t
WHERE   RN_Asc = 1 
ORDER BY NumOcc DESC 

使用此方法,引擎不需要执行任何聚合,只需通过表一次即可。
如果它不起作用,请尝试使用行号的部分顺序和分区以获得正确的分组。

try this method using the double row_number trick:

SELECT  RN_Desc as NumOcc, *
FROM    (
        SELECT  row_number() Over(partition by EventId order by EventLog, EventSource, Severity, Description) as RN_Asc,
                row_number() Over(partition by EventId order by EventLog desc, EventSource desc, Severity desc, Description desc) as RN_Desc,
                *
        FROM    WindowsEvent 
        WHERE   DeviceID='34818' 
                AND Severity=1 
                AND TimeOfEvent >= DATEADD(hh, DATEDIFF(hh, GETDATE(), GETUTCDATE()), '2010/10/27 12:00:00 AM') 
                AND TimeOfEvent <= DATEADD(hh, DATEDIFF(hh, GETDATE(), GETUTCDATE()), '2010/11/3 12:00:00 AM') 
                AND EventID<>34113 
                AND EventID<>34114 
                AND EventID<>34112 
                AND EventID<>57755 
                AND EventSource<>'AutoImportSvc.exe' 
                AND EventLog='Application' 
        ) t
WHERE   RN_Asc = 1 
ORDER BY NumOcc DESC 

with this the engine doesn't need to do any aggregations just a single pass through the table.
if it doesn't work try playing with the order by and partition by parts of the rownumber to get correct groupings.

感受沵的脚步 2024-10-08 14:48:48

这很简单,但我会尝试将索引值作为第一个测试

This is pretty simpleminded, but I'd try the indexed value as the first test in the

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文