为什么没有索引这个查询会更快？

发布于 2024-10-05 05:20:34 字数 1128 浏览 4 评论 0原文

我继承了一个新系统，我正在尝试对数据进行一些改进。我正在尝试改进这张表，但似乎无法理解我的发现。

我有以下表结构：

CREATE TABLE [dbo].[Calls](
    [CallID] [varchar](8) NOT NULL PRIMARY KEY,
    [RecvdDate] [varchar](10) NOT NULL,
    [yr] [int] NOT NULL,
    [Mnth] [int] NOT NULL,
    [CallStatus] [varchar](50) NOT NULL,
    [Category] [varchar](100) NOT NULL,
    [QCall] [varchar](15) NOT NULL,
    [KOUNT] [int] NOT NULL)

该表约有 220k 条记录。我需要返回日期大于特定日期的所有记录。在本例中为 2009 年 12 月 1 日。该查询将返回大约 66k 条记录，运行时间大约为 4 秒。从我过去开发过的系统来看，这似乎很高。特别是考虑到表中的记录很少。所以我想缩短这个时间。

所以我想知道有什么好方法可以减少这种情况？我尝试向表中添加日期列并将字符串日期转换为实际日期列。然后我在该日期列上添加了一个索引，但时间保持不变。鉴于没有那么多记录，我可以看到表扫描如何快速，但我认为索引可以缩短时间。

我还考虑过只查询月份和年份列。但我还没有尝试过。如果可能的话，希望将其保留在日期列之外。但如果没有我可以改变它。

任何帮助表示赞赏。

编辑：这是我尝试运行并测试表速度的查询。我通常会列出这些列，但为了简单起见，我使用了 * :

SELECT *
FROM _FirstSlaLevel_Tickets_New
WHERE TicketRecvdDateTime >= '12/01/2009'

编辑 2: 所以我提到我曾尝试创建一个包含日期列的表，其中包含 recvddate 数据，但作为日期而不是 varchar。这就是上面查询中的 TicketRecvdDateTime 列。我针对该表运行的原始查询是：

SELECT *
FROM Calls
WHERE CAST(RecvdDate AS DATE) >= '12/01/2009'

原文

I inherited a new system and I am trying to make some improvements on the data. I am trying to improve this table and can't seem to make sense of my findings.

I have the following table structure:

CREATE TABLE [dbo].[Calls](
    [CallID] [varchar](8) NOT NULL PRIMARY KEY,
    [RecvdDate] [varchar](10) NOT NULL,
    [yr] [int] NOT NULL,
    [Mnth] [int] NOT NULL,
    [CallStatus] [varchar](50) NOT NULL,
    [Category] [varchar](100) NOT NULL,
    [QCall] [varchar](15) NOT NULL,
    [KOUNT] [int] NOT NULL)

This table has about 220k records in it. I need to return all records that have a date greater than specific date. In this case 12/1/2009. This query will return about 66k records and it takes about 4 seconds to run. From past systems I have worked on this seems high. Especially given how few records are in the table. So I would like to bring that time down.

So I'm wondering what would be some good ways to bring that down? I tried adding a date column to the table and converting the string date to an actual date column. Then I added an index on that date column but the time stayed the same. Given that there aren't that many records I can see how a table scan could be fast but I would think that an index could bring that time down.

I have also considered just querying off the month and year columns. But I haven't tried it yet. And would like to keep it off the date column if possible. But if not I can change it.

Any help is appreciated.

EDIT: Here is the query I am trying to run and test the speed of the table. I usually put out the columns but just for simplicity I used * :

SELECT *
FROM _FirstSlaLevel_Tickets_New
WHERE TicketRecvdDateTime >= '12/01/2009'

EDIT 2: So I mentioned that I had tried to create a table with a date column that contained the recvddate data but as a date rather than a varchar. That is what TicketRecvdDateTime column is in the query above. The original query I am running against this table is:

SELECT *
FROM Calls
WHERE CAST(RecvdDate AS DATE) >= '12/01/2009'

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蹲在坟头点根烟 2024-10-12 05:20:34

您可能会遇到 SQL Server 中所谓的临界点。即使您在列上有适当的索引，如果返回的预期行数超过某个阈值（“临界点”），SQL Server 仍可能决定执行表扫描。

在您的示例中，这似乎很可能，因为您正在转动数据库中行数的 1/4。以下是解释这一点的好文章：http://www.sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx

回复收藏 0 原文

夜血缘 2024-10-12 05:20:34

SELECT * 通常性能较差。

要么索引将被忽略，要么您最终将在聚集索引中进行键/书签查找。没关系：两者都可能运行得很糟糕。

例如，如果您有此查询，并且 TicketRecvdDateTime 上的索引 INCLUDEd CallStatus，那么它很可能会按预期运行。这将是覆盖

SELECT CallStatus
FROM _FirstSlaLevel_Tickets_New
WHERE TicketRecvdDateTime >= '12/01/2009'

这是对 Randy Minder 答案的补充：键/书签查找对于少数行来说可能足够便宜，但对于大量表数据来说则不然。

SELECT * will usually give a poor performance.

Either the index will be ignored or you'll end up with a key/bookmark lookup into the clustered index. No matter: both can run badly.

For example, if you had this query, and the index on TicketRecvdDateTime INCLUDEd CallStatus, then it would most likely run as expected. This would be covering

SELECT CallStatus
FROM _FirstSlaLevel_Tickets_New
WHERE TicketRecvdDateTime >= '12/01/2009'

This is in addition to Randy Minder's answer: a key/bookmark lookup may be cheap enough for a handful of rows but not for a large chunk of the table data.

回复收藏 0 原文

恋竹姑娘 2024-10-12 05:20:34

您的查询在没有索引的情况下会更快（或者更准确地说，在有或没有索引的情况下速度相同），因为 RecvdDate 上的索引将始终在 CAST(RecvdDate AS DATE) >= '12/01/2009' 等表达式中被忽略。这是一个不可SARG 的表达式，因为它需要通过函数转换列。为了使此索引事件被考虑，您必须准确在正在索引的列上表达您的过滤条件，而不是基于基于该列的表达式。这将是第一步。

还有更多步骤：

删除日期的 VARCHAR(10) 列并将其替换为适当的 DATE 或 DATETIME 列。将日期和/或时间存储为字符串充满了问题。不仅为了索引，还为了正确性。
经常在基于列的范围内扫描的表（大多数此类调用日志表都是如此）应该按该列聚集。
您不太可能真正需要 yr 和 mnth 列。如果您确实需要它们，那么您可能需要它们作为计算列。

。

CREATE TABLE [dbo].[Calls](
    [CallID] [varchar](8) NOT NULL,
    [RecvdDate] [datetime](10) NOT NULL,
    [CallStatus] [varchar](50) NOT NULL,
    [Category] [varchar](100) NOT NULL,
    [QCall] [varchar](15) NOT NULL,
    [KOUNT] [int] NOT NULL,
    CONSTRAINT [PK_Calls_CallId] PRIMARY KEY NONCLUSTERED ([CallID]));

CREATE CLUSTERED INDEX cdxCalls ON Calls(RecvDate);

SELECT *
FROM Calls
WHERE RecvDate >= '12/01/2009';

当然，表和索引的正确结构应该是仔细分析的结果，考虑所涉及的所有因素，包括更新性能、其他查询等。我建议您首先浏览所有包含的主题在设计索引中。

Your query is faster w/o an index (or, more precisly, is the same speed w/ or w/o the indeX) because and index on RecvdDate will always be ignored in an expression like CAST(RecvdDate AS DATE) >= '12/01/2009'. This is a non-SARG-able expression, as it requires the column to be transformed trough a function. In order for this index event to be considered, you have to express your filter criteria exactly on the column being indexed, not on an expression based on it. This would be the first step.

There are more steps:

Get rid of the VARCHAR(10) column for dates and replace it with the appropriate DATE or DATETIME column. Storing date and/or time as strings is riddled with problems. Not only for indexing, but also for correctness.
A table that is frequently scanned on a range based on a column (as most such call log tables are) should be clustered by that column.
It is highly unlikely you really need the yr and mnth columns. If you really do need them, then you probably need them as computed columns.

CREATE TABLE [dbo].[Calls](
    [CallID] [varchar](8) NOT NULL,
    [RecvdDate] [datetime](10) NOT NULL,
    [CallStatus] [varchar](50) NOT NULL,
    [Category] [varchar](100) NOT NULL,
    [QCall] [varchar](15) NOT NULL,
    [KOUNT] [int] NOT NULL,
    CONSTRAINT [PK_Calls_CallId] PRIMARY KEY NONCLUSTERED ([CallID]));

CREATE CLUSTERED INDEX cdxCalls ON Calls(RecvDate);

SELECT *
FROM Calls
WHERE RecvDate >= '12/01/2009';

Of course, the proper structure of the table and indexes should be the result of careful analysis, considering all factors involved, including update performance, other queries etc. I recommend you start by going through all the topics included in Designing Indexes.

回复收藏 0 原文