带有前导通配符的参数化查询的 SQL Server 性能

发布于 2024-09-14 17:39:07 字数 1538 浏览 3 评论 0原文

我有一个 SQL 2008 R2 数据库，其中一个表中有大约 200 万行，并且在使用参数化 SQL 时，我正在努力解决特定查询的性能问题。

在表中，有一个包含名称的字段：

[患者姓名] nvarchar NULL,

该字段上还有一个简单的索引：


CREATE NONCLUSTERED INDEX [IX_Study_PatientsName] ON [dbo].[Study] 
(
    [PatientsName] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [INDEXES]
GO

当我在管理工作室中执行此查询时，执行大约需要 4 秒：


declare @StudyPatientsName nvarchar(64)
set @StudyPatientsName= '%Jones%'

SELECT COUNT(*) FROM Study WHERE Study.PatientsName like @StudyPatientsName

但是，当我执行此查询时：


SELECT COUNT(*) FROM Study WHERE Study.PatientsName like '%Jones%'

执行需要半秒多一点。

从执行计划来看，没有参数化的查询使用上述索引进行了索引扫描，这显然是高效的。参数化查询使用索引，但在索引上进行范围查找。

部分问题在于使用前导通配符。当我删除前导通配符时，两个查询都会在不到一秒的时间内返回。不幸的是，我确实需要支持前导通配符。

我们有一个自行开发的 ORM，可以在问题根源处执行参数化查询。这些查询是根据用户的输入完成的，因此参数化查询对于避免 SQL 注入攻击等事情很有意义。我想知道是否有办法使参数化查询功能和非参数化查询一样？

我做了一些研究，寻找向查询优化器提供提示的不同方法，试图强制优化器对每个查询重做查询计划，但还没有发现任何可以提高性能的方法。我尝试了这个查询：


SELECT COUNT(*) FROM Study WHERE Study.PatientsName like @StudyPatientsName
OPTION ( OPTIMIZE FOR (@StudyPatientsName = '%Jones%'))

这是这个问题，但没有什么区别。

任何帮助将不胜感激。

原文

I have a SQL 2008 R2 Database with about 2 million rows in one of the tables and am struggling with the performance of a specific query when using parameterized SQL.

In the table, there's a field containing a name in it:

[PatientsName] nvarchar NULL,

There's also a simple index on the field:


CREATE NONCLUSTERED INDEX [IX_Study_PatientsName] ON [dbo].[Study] 
(
    [PatientsName] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [INDEXES]
GO

When I do this query in the management studio, it takes around 4 seconds to execute:


declare @StudyPatientsName nvarchar(64)
set @StudyPatientsName= '%Jones%'

SELECT COUNT(*) FROM Study WHERE Study.PatientsName like @StudyPatientsName

But, when I execute this query:


SELECT COUNT(*) FROM Study WHERE Study.PatientsName like '%Jones%'

it takes a bit more than a half second to execute.

Looking at the execution plans, the query without parameterization does an Index Scan using the above mentioned index, which obviously is efficient. The parameterized query uses the index, but does a range seek on the index.

Part of the issue is having the leading wildcard. When I remove the leading wildcard, both queries return in a fraction of a second. Unfortunately, I do need to support leading wildcards.

We have a home grown ORM that does parameterized queries where the problem originated. These queries are done based on input from a user, so parameterized queries make sense to avoid things like a SQL injection attack. I'm wondering if there's a way to make the parameterized query function as well as the non-parameterized query?

I've done some research looking at different ways to give hints to the query optimizer, trying to force the optimizer to redo the query plan on each query, but haven't found anything yet to improve the performance. I tried this query:


SELECT COUNT(*) FROM Study WHERE Study.PatientsName like @StudyPatientsName
OPTION ( OPTIMIZE FOR (@StudyPatientsName = '%Jones%'))

which was mentioned as a solution in this question, but it didn't make a difference.

Any help would be appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

打小就很酷 2024-09-21 17:39:07

看来您想强制扫描。有一个 FORCESEEK 提示，但我看不到任何类似的 FORCESCAN 提示。不过这应该可以做到。

SELECT COUNT(*) 
FROM Study 
WHERE Study.PatientsName + '' like @StudyPatientsName

尽管也许您可以对您的数据尝试以下操作，看看效果如何。

SELECT COUNT(*) 
FROM Study 
WHERE Study.PatientsName  like @StudyPatientsName
option (recompile)

It seems like you want to force a scan. There is a FORCESEEK hint but I couldn't see any analogous FORCESCAN hint. This should do it though.

SELECT COUNT(*) 
FROM Study 
WHERE Study.PatientsName + '' like @StudyPatientsName

Although maybe you could try the following on your data and see how it works out .

SELECT COUNT(*) 
FROM Study 
WHERE Study.PatientsName  like @StudyPatientsName
option (recompile)

回复收藏 0 原文

淡墨 2024-09-21 17:39:07

我认为提高性能的最佳机会是考虑使用全文索引。

回复收藏 0 原文

一指流沙 2024-09-21 17:39:07

我很难找到文档来验证这一点，但 IIRC, COUNT(*) 在 MS SQL 中执行全表扫描（而不是使用缓存值）。如果您针对不能为空和/或定义了索引的列运行它，我相信（再次，仍然找不到文档来确认，所以我可能在这里偏离基础）快一点。

当您将查询修改为以下内容时会发生什么：

SELECT COUNT(id) FROM Study WHERE Study.PatientsName Like @StudyPatientsName

或

SELECT COUNT(PatientsName) FROM Study 
WHERE Study.PatientsName 
LIKE @StudyPatientsName

I'm having trouble finding the documentation to verify this, but IIRC, COUNT(*) does a full table scan in MS SQL (as opposed to using a cached value). If you run it against a column which cannot be null and/or has an index defined, I believe (again, still can't find docs to confirm, so I could be off base here) that will be faster.

What happens when you modify the query to something like:

SELECT COUNT(id) FROM Study WHERE Study.PatientsName Like @StudyPatientsName

SELECT COUNT(PatientsName) FROM Study 
WHERE Study.PatientsName 
LIKE @StudyPatientsName

回复收藏 0 原文

乖乖公主 2024-09-21 17:39:07

如果所有其他方法都失败了，您可以尝试

SELECT COUNT(*) FROM Study WITH(INDEX(0)) WHERE Study.PatientsName like @StudyPatientsName

也许您可以将其包装在 IF

IF substring(@StudyPatientsName, 1, 1) = '%'
    SELECT COUNT(*) FROM Study WITH(INDEX(0)) WHERE Study.PatientsName like @StudyPatientsName
ELSE
    SELECT COUNT(*) FROM Study WHERE Study.PatientsName like @StudyPatientsName

编辑： 中，正如马丁指出的那样，对于此特定查询，这可能不是最好的方法，因为对现有查询进行索引扫描索引可能会更快。不过，它可能适用于类似的情况。

If all else fails you could try

SELECT COUNT(*) FROM Study WITH(INDEX(0)) WHERE Study.PatientsName like @StudyPatientsName

Perhaps you could wrap it in an IF

IF substring(@StudyPatientsName, 1, 1) = '%'
    SELECT COUNT(*) FROM Study WITH(INDEX(0)) WHERE Study.PatientsName like @StudyPatientsName
ELSE
    SELECT COUNT(*) FROM Study WHERE Study.PatientsName like @StudyPatientsName

Edit: As martin pointed out, for this specific query this is probably not the best way to do it since an index scan of the existing index is likely faster. It might be applicable in similar situations, though.

回复收藏 0 原文

~没有更多了~