转换为日期时间仅在 WHERE 子句上失败?

发布于 2024-12-02 03:47:14 字数 1425 浏览 0 评论 0原文

我在执行某些 SQL Server 查询时遇到问题。事实证明,我有一个包含“Attibute_Name”和“Attibute_Value”字段的表,这些字段可以是任何类型,存储在 varchar 中。 (是的......我知道。)

特定属性的所有日期似乎都以“YYYY-MM-DD hh:mm:ss”格式存储(对此不能100%确定,这里有数百万条记录) ,所以我可以毫无问题地执行此代码:

select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from 
    ProductAttributes pa
    inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where 
    a.Attribute_Name = 'SomeDate'

但是,如果我执行以下代码:

select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from 
    ProductAttributes pa
    inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where 
    a.Attribute_Name = 'SomeDate'
    and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()

我将收到以下错误: 从字符串转换日期和/或时间时转换失败。

为什么在 where 子句上失败而不是在 select 子句上失败?

另一种线索:

如果我不使用 Attribute_Name 进行过滤,而是使用数据库 (PK) 中存储的实际 Attribute_ID,那么它将毫无问题地工作。

select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from 
    ProductAttributes pa
    inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where 
    a.Attribute_ID = 15
    and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()

更新 谢谢大家的回答。我发现很难真正选择正确的答案,因为每个人都指出了一些对理解问题有用的东西。这肯定与执行顺序有关。 事实证明,我的第一个查询工作正常,因为先执行 WHERE 子句,然后执行 SELECT。 我的第二个查询由于相同的原因而失败(由于未过滤属性,因此在执行相同的 WHERE 子句时转换失败)。 我的第三个查询成功了,因为 ID 是索引 (PK) 的一部分,因此它优先,并且首先根据该条件向下钻取结果。

谢谢!

I'm having a problem with some SQL server queries. Turns out that I have a table with "Attibute_Name" and "Attibute_Value" fields, which can be of any type, stored in varchar. (Yeah... I know.)

All the dates for a particular attribute seem to be stored the the "YYYY-MM-DD hh:mm:ss" format (not 100% sure about that, there are millions of records here), so I can execute this code without problems:

select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from 
    ProductAttributes pa
    inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where 
    a.Attribute_Name = 'SomeDate'

However, if I execute the following code:

select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from 
    ProductAttributes pa
    inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where 
    a.Attribute_Name = 'SomeDate'
    and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()

I will get the following error:
Conversion failed when converting date and/or time from character string.

How come it fails on the where clause and not on the select one?

Another clue:

If instead of filtering by the Attribute_Name I use the actual Attribute_ID stored in database (PK) it will work without problem.

select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from 
    ProductAttributes pa
    inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where 
    a.Attribute_ID = 15
    and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()

Update
Thanks everyone for the answers. I found it hard to actually choose a correct answer because everyone pointed out something that was useful to understanding the issue. It was definitely having to do with the order of execution.
Turns out that my first query worked correctly because the WHERE clause was executed first, then the SELECT.
My second query failed because of the same reason (as the Attributes were not filtered, the conversion failed while executing the same WHERE clause).
My third query worked because the ID was part of an index (PK), so it took precedence and it drilled down results on that condition first.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

杀手六號 2024-12-09 03:47:14

您似乎假设在 WHERE 子句中对谓词进行某种短路评估或保证排序。这是无法保证的。当您在列中混合数据类型时,处理它们的唯一安全方法是使用 CASE 表达式。

使用(例如)

CONVERT(DATETIME, 
      CASE WHEN ISDATE(pa.Attribute_Value) = 1 THEN pa.Attribute_Value END)

不使用

CONVERT(DATETIME, pa.Attribute_Value)

You seem to be assuming some sort of short circuiting evaluation or guaranteed ordering of the predicates in the WHERE clause. This is not guaranteed. When you have mixed datatypes in a column like that the only safe way of dealing them is with a CASE expression.

Use (e.g.)

CONVERT(DATETIME, 
      CASE WHEN ISDATE(pa.Attribute_Value) = 1 THEN pa.Attribute_Value END)

Not

CONVERT(DATETIME, pa.Attribute_Value)
永言不败 2024-12-09 03:47:14

如果转换位于 WHERE 子句中,则可能会评估比出现在投影列表中更多的记录(值)。我之前已经在不同的上下文中讨论过这个问题,请参阅 T-SQL 函数并不暗示一定的执行顺序关于 SQL Server 布尔运算符短路 。你的情况更简单,但很相似,最终的根本原因是相同的:在处理像 SQL 这样的声明性语言时,不要假设命令式执行顺序。

从很大程度上来说,最好的解决方案是清理数据并将列类型更改为 DATETIME 或 DATETIME2 类型。 所有其他解决方法都会有一个或另一个缺点,因此您最好只做正确的事情。

更新

仔细查看后(抱歉,我是@VLDB,只在会话之间查看)我意识到您有一个具有固有的无类型语义的 EAV 存储(attribute_value可以是字符串、日期、整数等)。我的观点是,最好的选择是在存储中使用 sql_variant 并一直到客户端(即项目 sql_variant)。您可以在客户端中凝聚类型,所有客户端 API 都有从 sql_variant 中提取内部类型的方法,请参阅 使用 sql_variant 数据(好吧,几乎所有客户端 API...使用 sql_variant 数据类型CLR)。使用sql_variant,您可以存储多种类型,而无需处理字符串表示形式的问题,您可以使用SQL_VARIANT_PROPERTY 来检查存储值中的 BaseType 之类的内容,您甚至可以像检查这样的想法强制数据类型正确性的约束。

If the conversion is in the WHERE clause it may be evaluated for many more records (values) than it would be if it appears in the projection list. I have talked about this before in different context, see T-SQL functions do no imply a certain order of execution and On SQL Server boolean operator short-circuit. Your case is even simpler, but is similar, and ultimately the root cause is the same: do not an assume an imperative execution order when dealing with a declarative language like SQL.

Your best solution, by a far and a large margin, is to sanitize the data and change the column type to a DATETIME or DATETIME2 type. All other workarounds will have one shortcoming or another, so you may be better to just do the right thing.

Update

After a closer look (sorry, I'm @VLDB and only peeking SO between sessions) I realize you have an EAV store with inherent type-free semantics (the attribute_value can bea string, a date, an int etc). My opinion is that your best bet is to use sql_variant in storage and all the way up to the client (ie. project sql_variant). You can coherce the type in the client, all client APIs have methods to extract the inner type from a sql_variant, see Using sql_variant Data (well, almost all client APIs... Using the sql_variant datatype in CLR). With sql_variant you can store multiple types w/o the problems of going through a string representations, you can use SQL_VARIANT_PROPERTY to inspect things like the BaseType in the stored values, and you can even do thinks like check constraints to enforce data type correctness.

玩套路吗 2024-12-09 03:47:14

这与 SELECT 查询的处理顺序有关。 WHERE 子句在 SELECT 之前很久就被处理。它必须确定要包含/排除哪些行。使用该名称的子句必须使用扫描来调查所有行,其中一些行不包含有效的日期/时间数据,而键可能会导致查找,并且此时不包含任何无效行。 SELECT 列表中的转换最后执行,显然此时它不会尝试转换无效行。由于您将日期/时间数据与其他数据混合,因此您可以考虑将日期或数字数据存储在具有正确数据类型的专用列中。同时,您可以通过以下方式推迟检查:

SELECT /* ... */
FROM
(
  SELECT /* ... */
    FROM ProductAttributes AS pa
    INNER JOIN dbo.Attributes AS a
    ON a.Attribute_ID = pa.Attribute_ID
    WHERE a.Attribute_Name = 'SomeDate'
    AND ISDATE (pa.Attribute_Value) = 1
) AS z
WHERE CONVERT(CHAR(8), AttributeValue, 112) < CONVERT(CHAR(8), GETDATE(), 112);

但更好的答案可能是如果可能的话使用 Attribute_ID 键而不是名称。

This has to do with the order that a SELECT query is processed. The WHERE clause is processed long before the SELECT. It has to determine which rows to include/exclude. The clause that uses the name must use a scan that investigates all rows, some of which do not contain valid date/time data, whereas the key probably leads to a seek and none of the invalid rows are included at the point. The convert in the SELECT list is performed last, and clearly by this time it is not going to try to convert invalid rows. Since you're mixing date/time data with other data, you may consider storing date or numeric data in dedicated columns with correct data types. In the meantime, you can defer the check in the following way:

SELECT /* ... */
FROM
(
  SELECT /* ... */
    FROM ProductAttributes AS pa
    INNER JOIN dbo.Attributes AS a
    ON a.Attribute_ID = pa.Attribute_ID
    WHERE a.Attribute_Name = 'SomeDate'
    AND ISDATE (pa.Attribute_Value) = 1
) AS z
WHERE CONVERT(CHAR(8), AttributeValue, 112) < CONVERT(CHAR(8), GETDATE(), 112);

But the better answer is probably to use the Attribute_ID key instead of the name if possible.

西瓜 2024-12-09 03:47:14

对我来说似乎是一个数据问题。当您使用两种不同的方法选择数据时,请查看数据,尝试寻找不同的长度,然后选择不同集合中的项目并观察它们。还检查空值吗? (我不确定如果您尝试将 null 转换为日期时间会发生什么)

Seems like a data issue to me. Take a look at the data when you select it using the two different methods, try looking for distinct lengthsand then select the items in the different sets and eyeball them. Also check for nulls? (I'm not sure what happens if you try converting null to a datetime)

碍人泪离人颜 2024-12-09 03:47:14

我认为问题是你的数据库中有一个错误的日期(显然)。

在第一个示例中,您没有检查 WHERE 子句中的日期,a.attribute.Name = 'SomeDate' 的所有日期都是有效的,因此它永远不会尝试转换错误的日期日期。

在第二个示例中,添加到 WHERE 子句会导致查询计划实际转换所有这些日期并找到错误的日期,然后查看属性名称。

在您的第三个示例中,更改为使用 Attribute_Id 可能会更改查询计划,以便它仅查找 id = 15 的记录,然后检查这些记录是否具有有效日期,它们确实如此。 (也许 Attribute_Id 已建立索引,而 Attribute_name 未建立索引)

因此,您在某处有一个错误的日期,但它不在 Arttribute_id = 15 的任何记录中。

I think the problem is you have a bad date in your database (obviously).

In your first example, where you aren't checking the date in the WHERE clause, all of the dates where a.attribute.Name = 'SomeDate' are valid, so it never attempts to convert a bad date.

In your second example, the addition to the WHERE clause is causing the query plan to actually convert all those dates and finding the bad one and then looking at attribute name.

In your third example, changing to use Attribute_Id probably changes the query plan so that it only looks for those where id = 15 First, and then checks to see if those records have a valid date, which they do. (Perhaps Attribute_Id is indexed and Attribute_name isn't)

So, you have a bad date somewhere, but it's not on any records with Arttribute_id = 15.

陌伤ぢ 2024-12-09 03:47:14

您可以检查执行计划。可能是,在第一个查询中,第二个条件( CONVERT(DATETIME, pa.Attribute_Value) < GETDATE() )首先在所有行上进行评估,包括具有无效数据(不是日期)的行,而在第二种情况 - a.Attribute_ID = 15 首先被评估。因此排除具有非日期值的行。

顺便说一句,第二个可能也更快,如果您在选择列表中没有任何 Attributes 内容,则可以删除 inner join Attributes a on a.Attribute_ID = pa .Attribute_ID

就这一点而言,建议摆脱 EAV,趁现在还不算太晚:)

You can check execution plans. It might be that with first query the second criteria ( CONVERT(DATETIME, pa.Attribute_Value) < GETDATE() ) gets evaluated first over all rows including ones with invalid data (not date), while in the case of the second one - a.Attribute_ID = 15 get evaluated first. Thus excluding rows with non-date values.

btw, the second one might be faster also, and if you don't have anything from Attributes in the select list, you can get rid of inner join Attributes a on a.Attribute_ID = pa.Attribute_ID.

On that note, it would be advisable to get rid of EAV while it's no too late :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文