转换为日期时间仅在 WHERE 子句上失败?
我在执行某些 SQL Server 查询时遇到问题。事实证明,我有一个包含“Attibute_Name”和“Attibute_Value”字段的表,这些字段可以是任何类型,存储在 varchar 中。 (是的......我知道。)
特定属性的所有日期似乎都以“YYYY-MM-DD hh:mm:ss”格式存储(对此不能100%确定,这里有数百万条记录) ,所以我可以毫无问题地执行此代码:
select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from
ProductAttributes pa
inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where
a.Attribute_Name = 'SomeDate'
但是,如果我执行以下代码:
select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from
ProductAttributes pa
inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where
a.Attribute_Name = 'SomeDate'
and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()
我将收到以下错误: 从字符串转换日期和/或时间时转换失败。
为什么在 where 子句上失败而不是在 select 子句上失败?
另一种线索:
如果我不使用 Attribute_Name 进行过滤,而是使用数据库 (PK) 中存储的实际 Attribute_ID,那么它将毫无问题地工作。
select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from
ProductAttributes pa
inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where
a.Attribute_ID = 15
and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()
更新 谢谢大家的回答。我发现很难真正选择正确的答案,因为每个人都指出了一些对理解问题有用的东西。这肯定与执行顺序有关。 事实证明,我的第一个查询工作正常,因为先执行 WHERE 子句,然后执行 SELECT。 我的第二个查询由于相同的原因而失败(由于未过滤属性,因此在执行相同的 WHERE 子句时转换失败)。 我的第三个查询成功了,因为 ID 是索引 (PK) 的一部分,因此它优先,并且首先根据该条件向下钻取结果。
谢谢!
I'm having a problem with some SQL server queries. Turns out that I have a table with "Attibute_Name" and "Attibute_Value" fields, which can be of any type, stored in varchar. (Yeah... I know.)
All the dates for a particular attribute seem to be stored the the "YYYY-MM-DD hh:mm:ss" format (not 100% sure about that, there are millions of records here), so I can execute this code without problems:
select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from
ProductAttributes pa
inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where
a.Attribute_Name = 'SomeDate'
However, if I execute the following code:
select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from
ProductAttributes pa
inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where
a.Attribute_Name = 'SomeDate'
and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()
I will get the following error:
Conversion failed when converting date and/or time from character string.
How come it fails on the where clause and not on the select one?
Another clue:
If instead of filtering by the Attribute_Name I use the actual Attribute_ID stored in database (PK) it will work without problem.
select /*...*/ CONVERT(DATETIME, pa.Attribute_Value)
from
ProductAttributes pa
inner join Attributes a on a.Attribute_ID = pa.Attribute_ID
where
a.Attribute_ID = 15
and CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()
Update
Thanks everyone for the answers. I found it hard to actually choose a correct answer because everyone pointed out something that was useful to understanding the issue. It was definitely having to do with the order of execution.
Turns out that my first query worked correctly because the WHERE clause was executed first, then the SELECT.
My second query failed because of the same reason (as the Attributes were not filtered, the conversion failed while executing the same WHERE clause).
My third query worked because the ID was part of an index (PK), so it took precedence and it drilled down results on that condition first.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您似乎假设在
WHERE
子句中对谓词进行某种短路评估或保证排序。这是无法保证的。当您在列中混合数据类型时,处理它们的唯一安全方法是使用CASE
表达式。使用(例如)
不使用
You seem to be assuming some sort of short circuiting evaluation or guaranteed ordering of the predicates in the
WHERE
clause. This is not guaranteed. When you have mixed datatypes in a column like that the only safe way of dealing them is with aCASE
expression.Use (e.g.)
Not
如果转换位于 WHERE 子句中,则可能会评估比出现在投影列表中更多的记录(值)。我之前已经在不同的上下文中讨论过这个问题,请参阅 T-SQL 函数并不暗示一定的执行顺序 和 关于 SQL Server 布尔运算符短路 。你的情况更简单,但很相似,最终的根本原因是相同的:在处理像 SQL 这样的声明性语言时,不要假设命令式执行顺序。
从很大程度上来说,最好的解决方案是清理数据并将列类型更改为 DATETIME 或 DATETIME2 类型。 所有其他解决方法都会有一个或另一个缺点,因此您最好只做正确的事情。
更新
仔细查看后(抱歉,我是@VLDB,只在会话之间查看)我意识到您有一个具有固有的无类型语义的 EAV 存储(
attribute_value
可以是字符串、日期、整数等)。我的观点是,最好的选择是在存储中使用sql_variant
并一直到客户端(即项目sql_variant
)。您可以在客户端中凝聚类型,所有客户端 API 都有从sql_variant
中提取内部类型的方法,请参阅 使用 sql_variant 数据(好吧,几乎所有客户端 API...使用 sql_variant 数据类型CLR)。使用sql_variant
,您可以存储多种类型,而无需处理字符串表示形式的问题,您可以使用SQL_VARIANT_PROPERTY
来检查存储值中的BaseType
之类的内容,您甚至可以像检查这样的想法强制数据类型正确性的约束。If the conversion is in the WHERE clause it may be evaluated for many more records (values) than it would be if it appears in the projection list. I have talked about this before in different context, see T-SQL functions do no imply a certain order of execution and On SQL Server boolean operator short-circuit. Your case is even simpler, but is similar, and ultimately the root cause is the same: do not an assume an imperative execution order when dealing with a declarative language like SQL.
Your best solution, by a far and a large margin, is to sanitize the data and change the column type to a DATETIME or DATETIME2 type. All other workarounds will have one shortcoming or another, so you may be better to just do the right thing.
Update
After a closer look (sorry, I'm @VLDB and only peeking SO between sessions) I realize you have an EAV store with inherent type-free semantics (the
attribute_value
can bea string, a date, an int etc). My opinion is that your best bet is to usesql_variant
in storage and all the way up to the client (ie. projectsql_variant
). You can coherce the type in the client, all client APIs have methods to extract the inner type from asql_variant
, see Using sql_variant Data (well, almost all client APIs... Using the sql_variant datatype in CLR). Withsql_variant
you can store multiple types w/o the problems of going through a string representations, you can useSQL_VARIANT_PROPERTY
to inspect things like theBaseType
in the stored values, and you can even do thinks like check constraints to enforce data type correctness.这与
SELECT
查询的处理顺序有关。WHERE
子句在SELECT
之前很久就被处理。它必须确定要包含/排除哪些行。使用该名称的子句必须使用扫描来调查所有行,其中一些行不包含有效的日期/时间数据,而键可能会导致查找,并且此时不包含任何无效行。 SELECT 列表中的转换最后执行,显然此时它不会尝试转换无效行。由于您将日期/时间数据与其他数据混合,因此您可以考虑将日期或数字数据存储在具有正确数据类型的专用列中。同时,您可以通过以下方式推迟检查:但更好的答案可能是如果可能的话使用
Attribute_ID
键而不是名称。This has to do with the order that a
SELECT
query is processed. TheWHERE
clause is processed long before theSELECT
. It has to determine which rows to include/exclude. The clause that uses the name must use a scan that investigates all rows, some of which do not contain valid date/time data, whereas the key probably leads to a seek and none of the invalid rows are included at the point. The convert in the SELECT list is performed last, and clearly by this time it is not going to try to convert invalid rows. Since you're mixing date/time data with other data, you may consider storing date or numeric data in dedicated columns with correct data types. In the meantime, you can defer the check in the following way:But the better answer is probably to use the
Attribute_ID
key instead of the name if possible.对我来说似乎是一个数据问题。当您使用两种不同的方法选择数据时,请查看数据,尝试寻找不同的长度,然后选择不同集合中的项目并观察它们。还检查空值吗? (我不确定如果您尝试将 null 转换为日期时间会发生什么)
Seems like a data issue to me. Take a look at the data when you select it using the two different methods, try looking for distinct lengthsand then select the items in the different sets and eyeball them. Also check for nulls? (I'm not sure what happens if you try converting null to a datetime)
我认为问题是你的数据库中有一个错误的日期(显然)。
在第一个示例中,您没有检查
WHERE
子句中的日期,a.attribute.Name = 'SomeDate' 的所有日期都是有效的,因此它永远不会尝试转换错误的日期日期。在第二个示例中,添加到
WHERE
子句会导致查询计划实际转换所有这些日期并找到错误的日期,然后查看属性名称。在您的第三个示例中,更改为使用
Attribute_Id
可能会更改查询计划,以便它仅查找 id = 15 的记录,然后检查这些记录是否具有有效日期,它们确实如此。 (也许Attribute_Id
已建立索引,而Attribute_name
未建立索引)因此,您在某处有一个错误的日期,但它不在 Arttribute_id = 15 的任何记录中。
I think the problem is you have a bad date in your database (obviously).
In your first example, where you aren't checking the date in the
WHERE
clause, all of the dates where a.attribute.Name = 'SomeDate' are valid, so it never attempts to convert a bad date.In your second example, the addition to the
WHERE
clause is causing the query plan to actually convert all those dates and finding the bad one and then looking at attribute name.In your third example, changing to use
Attribute_Id
probably changes the query plan so that it only looks for those where id = 15 First, and then checks to see if those records have a valid date, which they do. (PerhapsAttribute_Id
is indexed andAttribute_name
isn't)So, you have a bad date somewhere, but it's not on any records with Arttribute_id = 15.
您可以检查执行计划。可能是,在第一个查询中,第二个条件(
CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()
)首先在所有行上进行评估,包括具有无效数据(不是日期)的行,而在第二种情况 -a.Attribute_ID = 15
首先被评估。因此排除具有非日期值的行。顺便说一句,第二个可能也更快,如果您在选择列表中没有任何
Attributes
内容,则可以删除inner join Attributes a on a.Attribute_ID = pa .Attribute_ID
。就这一点而言,建议摆脱 EAV,趁现在还不算太晚:)
You can check execution plans. It might be that with first query the second criteria (
CONVERT(DATETIME, pa.Attribute_Value) < GETDATE()
) gets evaluated first over all rows including ones with invalid data (not date), while in the case of the second one -a.Attribute_ID = 15
get evaluated first. Thus excluding rows with non-date values.btw, the second one might be faster also, and if you don't have anything from
Attributes
in the select list, you can get rid ofinner join Attributes a on a.Attribute_ID = pa.Attribute_ID
.On that note, it would be advisable to get rid of EAV while it's no too late :)