为什么没有索引这个查询会更快?

发布于 2024-10-05 05:20:34 字数 1128 浏览 4 评论 0原文

我继承了一个新系统,我正在尝试对数据进行一些改进。我正在尝试改进这张表,但似乎无法理解我的发现。

我有以下表结构:

CREATE TABLE [dbo].[Calls](
    [CallID] [varchar](8) NOT NULL PRIMARY KEY,
    [RecvdDate] [varchar](10) NOT NULL,
    [yr] [int] NOT NULL,
    [Mnth] [int] NOT NULL,
    [CallStatus] [varchar](50) NOT NULL,
    [Category] [varchar](100) NOT NULL,
    [QCall] [varchar](15) NOT NULL,
    [KOUNT] [int] NOT NULL)

该表约有 220k 条记录。我需要返回日期大于特定日期的所有记录。在本例中为 2009 年 12 月 1 日。该查询将返回大约 66k 条记录,运行时间大约为 4 秒。从我过去开发过的系统来看,这似乎很高。特别是考虑到表中的记录很少。所以我想缩短这个时间。

所以我想知道有什么好方法可以减少这种情况?我尝试向表中添加日期列并将字符串日期转换为实际日期列。然后我在该日期列上添加了一个索引,但时间保持不变。鉴于没有那么多记录,我可以看到表扫描如何快速,但我认为索引可以缩短时间。

我还考虑过只查询月份和年份列。但我还没有尝试过。如果可能的话,希望将其保留在日期列之外。但如果没有我可以改变它。

任何帮助表示赞赏。

编辑:这是我尝试运行并测试表速度的查询。我通常会列出这些列,但为了简单起见,我使用了 * :

SELECT *
FROM _FirstSlaLevel_Tickets_New
WHERE TicketRecvdDateTime >= '12/01/2009'

编辑 2: 所以我提到我曾尝试创建一个包含日期列的表,其中包含 recvddate 数据,但作为日期而不是 varchar。这就是上面查询中的 TicketRecvdDateTime 列。我针对该表运行的原始查询是:

SELECT *
FROM Calls
WHERE CAST(RecvdDate AS DATE) >= '12/01/2009'

I inherited a new system and I am trying to make some improvements on the data. I am trying to improve this table and can't seem to make sense of my findings.

I have the following table structure:

CREATE TABLE [dbo].[Calls](
    [CallID] [varchar](8) NOT NULL PRIMARY KEY,
    [RecvdDate] [varchar](10) NOT NULL,
    [yr] [int] NOT NULL,
    [Mnth] [int] NOT NULL,
    [CallStatus] [varchar](50) NOT NULL,
    [Category] [varchar](100) NOT NULL,
    [QCall] [varchar](15) NOT NULL,
    [KOUNT] [int] NOT NULL)

This table has about 220k records in it. I need to return all records that have a date greater than specific date. In this case 12/1/2009. This query will return about 66k records and it takes about 4 seconds to run. From past systems I have worked on this seems high. Especially given how few records are in the table. So I would like to bring that time down.

So I'm wondering what would be some good ways to bring that down? I tried adding a date column to the table and converting the string date to an actual date column. Then I added an index on that date column but the time stayed the same. Given that there aren't that many records I can see how a table scan could be fast but I would think that an index could bring that time down.

I have also considered just querying off the month and year columns. But I haven't tried it yet. And would like to keep it off the date column if possible. But if not I can change it.

Any help is appreciated.

EDIT: Here is the query I am trying to run and test the speed of the table. I usually put out the columns but just for simplicity I used * :

SELECT *
FROM _FirstSlaLevel_Tickets_New
WHERE TicketRecvdDateTime >= '12/01/2009'

EDIT 2: So I mentioned that I had tried to create a table with a date column that contained the recvddate data but as a date rather than a varchar. That is what TicketRecvdDateTime column is in the query above. The original query I am running against this table is:

SELECT *
FROM Calls
WHERE CAST(RecvdDate AS DATE) >= '12/01/2009'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

蹲在坟头点根烟 2024-10-12 05:20:34

您可能会遇到 SQL Server 中所谓的临界点。即使您在列上有适当的索引,如果返回的预期行数超过某个阈值(“临界点”),SQL Server 仍可能决定执行表扫描。

在您的示例中,这似乎很可能,因为您正在转动数据库中行数的 1/4。以下是解释这一点的好文章:http://www.sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx

You may be encountering what is referred to as the Tipping Point in SQL Server. Even though you have the appropriate index on the column, SQL Server may decided to do a table scan anyway if the expected number of rows returned exceeds some threshold (the 'tipping point').

In your example, this seems likely since your is turning 1/4 of the number of rows in the database. The following is a good article that explains this: http://www.sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx

夜血缘 2024-10-12 05:20:34

SELECT * 通常性能较差。

要么索引将被忽略,要么您最终将在聚集索引中进行键/书签查找。没关系:两者都可能运行得很糟糕。

例如,如果您有此查询,并且 TicketRecvdDateTime 上的索引 INCLUDEd CallStatus,那么它很可能会按预期运行。这将是 覆盖

SELECT CallStatus
FROM _FirstSlaLevel_Tickets_New
WHERE TicketRecvdDateTime >= '12/01/2009'

这是对 Randy Minder 答案的补充:键/书签查找对于少数行来说可能足够便宜,但对于大量表数据来说则不然。

SELECT * will usually give a poor performance.

Either the index will be ignored or you'll end up with a key/bookmark lookup into the clustered index. No matter: both can run badly.

For example, if you had this query, and the index on TicketRecvdDateTime INCLUDEd CallStatus, then it would most likely run as expected. This would be covering

SELECT CallStatus
FROM _FirstSlaLevel_Tickets_New
WHERE TicketRecvdDateTime >= '12/01/2009'

This is in addition to Randy Minder's answer: a key/bookmark lookup may be cheap enough for a handful of rows but not for a large chunk of the table data.

恋竹姑娘 2024-10-12 05:20:34

您的查询在没有索引的情况下会更快(或者更准确地说,在有或没有索引的情况下速度相同),因为 RecvdDate 上的索引将始终CAST(RecvdDate AS DATE) >= '12/01/2009' 等表达式中被忽略。这是一个不可SARG 的表达式,因为它需要通过函数转换列。为了使此索引事件被考虑,您必须准确在正在索引的列上表达您的过滤条件,而不是基于基于该列的表达式。这将是第一步。

还有更多步骤:

  • 删除日期的 VARCHAR(10) 列并将其替换为适当的 DATE 或 DATETIME 列。将日期和/或时间存储为字符串充满了问题。不仅为了索引,还为了正确性。
  • 经常在基于列的范围内扫描的表(大多数此类调用日志表都是如此)应该按该列聚集。
  • 您不太可能真正需要 yrmnth 列。如果您确实需要它们,那么您可能需要它们作为计算列。

CREATE TABLE [dbo].[Calls](
    [CallID] [varchar](8) NOT NULL,
    [RecvdDate] [datetime](10) NOT NULL,
    [CallStatus] [varchar](50) NOT NULL,
    [Category] [varchar](100) NOT NULL,
    [QCall] [varchar](15) NOT NULL,
    [KOUNT] [int] NOT NULL,
    CONSTRAINT [PK_Calls_CallId] PRIMARY KEY NONCLUSTERED ([CallID]));

CREATE CLUSTERED INDEX cdxCalls ON Calls(RecvDate);

SELECT *
FROM Calls
WHERE RecvDate >= '12/01/2009';

当然,表和索引的正确结构应该是仔细分析的结果,考虑所涉及的所有因素,包括更新性能、其他查询等。我建议您首先浏览所有包含的主题在设计索引中。

Your query is faster w/o an index (or, more precisly, is the same speed w/ or w/o the indeX) because and index on RecvdDate will always be ignored in an expression like CAST(RecvdDate AS DATE) >= '12/01/2009'. This is a non-SARG-able expression, as it requires the column to be transformed trough a function. In order for this index event to be considered, you have to express your filter criteria exactly on the column being indexed, not on an expression based on it. This would be the first step.

There are more steps:

  • Get rid of the VARCHAR(10) column for dates and replace it with the appropriate DATE or DATETIME column. Storing date and/or time as strings is riddled with problems. Not only for indexing, but also for correctness.
  • A table that is frequently scanned on a range based on a column (as most such call log tables are) should be clustered by that column.
  • It is highly unlikely you really need the yr and mnth columns. If you really do need them, then you probably need them as computed columns.

.

CREATE TABLE [dbo].[Calls](
    [CallID] [varchar](8) NOT NULL,
    [RecvdDate] [datetime](10) NOT NULL,
    [CallStatus] [varchar](50) NOT NULL,
    [Category] [varchar](100) NOT NULL,
    [QCall] [varchar](15) NOT NULL,
    [KOUNT] [int] NOT NULL,
    CONSTRAINT [PK_Calls_CallId] PRIMARY KEY NONCLUSTERED ([CallID]));

CREATE CLUSTERED INDEX cdxCalls ON Calls(RecvDate);

SELECT *
FROM Calls
WHERE RecvDate >= '12/01/2009';

Of course, the proper structure of the table and indexes should be the result of careful analysis, considering all factors involved, including update performance, other queries etc. I recommend you start by going through all the topics included in Designing Indexes.

肩上的翅膀 2024-10-12 05:20:34

您可以更改您的查询吗?如果需要很少的列,您可以更改 SELECT 子句以返回更少的列。然后,您可以创建一个覆盖索引,其中包含引用的所有列,包括 TicketRecvdDateTime

您可以在 TicketRecvdDateTime 上创建索引,但您可能无法避免 @Randy Minder 讨论的临界点。但是,对较小索引(小于表扫描)的扫描将返回较少的页面。

Can you alter your query? If few columns are needed, you can alter the SELECT clause to return fewer columns. And, then you can create a covering index that includes all columns referenced, including TicketRecvdDateTime.

You might create the index on TicketRecvdDateTime, but you may not avoid the tipping point that @Randy Minder discusses. However, a scan on the smaller index (smaller than table scan) would return fewer pages.

再可℃爱ぅ一点好了 2024-10-12 05:20:34

假设 RecvdDate 是您正在讨论的 TicketRecvdDateTime:

如果字段类型为 DATE,SQL Server 仅比较单引号中的日期。您的查询可能将它们与 VARCHAR 进行比较。尝试添加一行“99/99/0001”,看看它是否显示在底部。

如果是这样,则您的查询结果不正确。将类型更改为日期。

请注意,VARCHAR 的索引不好,而 DATETIME 可以。

检查查询计划以查看其是否使用索引。如果数据库与可用 RAM 相比较小,它可能会简单地进行表扫描并将所有内容保存在内存中。

编辑:在看到您的 CAST/DATETIME 编辑时,让我指出从 VARCHAR 解析日期是一项非常昂贵的操作。您已执行此操作 22 万次。这会降低性能。

此外,您不再检查索引字段。与涉及索引字段的表达式的比较不使用索引。

Assuming RecvdDate is the TicketRecvdDateTime you are talking about:

SQL Server only compares dates in single quotes if the field type is DATE. Your query is probably comparing them as VARCHAR. try adding a row with '99/99/0001' and see if it shows at the bottom.

If so, your query results are incorrect. Change type to DATE.

Note that VARCHAR does not index well , DATETIME does.

Check the query plan to see if its using indices. If the DB is small compared to available RAM, it may simply table scan and hold everything in memory.

EDIT: On seeing your CAST/DATETIME edit, let me point out that parsing a date from a VARCHAR is a very expensive operation. You are doing this 220k times. This will kill performance.

Also you are no longer checking on an indexed field. a compare with an expression involving an index field does not use the index.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文