SQL Server 2008 空字符串与空格

发布于 2024-08-04 02:35:21 字数 379 浏览 10 评论 0原文

今天早上我遇到了一些奇怪的事情,我想我应该将其提交以供评论。

有人可以解释为什么以下 SQL 查询在针对 SQL 2008 运行时打印“等于”。数据库兼容性级别设置为 100。

if '' = ' '
    print 'equal'
else
    print 'not equal'

这将返回 0:

select (LEN(' '))

它似乎是自动修剪空间。我不知道以前版本的 SQL Server 中是否存在这种情况,而且我也没有办法测试它。

我遇到这个问题是因为生产查询返回了错误的结果。我在任何地方都找不到这种行为的记录。

有人有这方面的信息吗?

I ran into something a little odd this morning and thought I'd submit it for commentary.

Can someone explain why the following SQL query prints 'equal' when run against SQL 2008. The db compatibility level is set to 100.

if '' = ' '
    print 'equal'
else
    print 'not equal'

And this returns 0:

select (LEN(' '))

It appears to be auto trimming the space. I have no idea if this was the case in previous versions of SQL Server, and I no longer have any around to even test it.

I ran into this because a production query was returning incorrect results. I cannot find this behavior documented anywhere.

Does anyone have any information on this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

送君千里 2024-08-11 02:35:21

varchar 和相等性在 TSQL 中是棘手的。 LEN 函数表示:

返回给定字符串表达式的字符数,而不是字节数,不包括尾随空格

您需要使用DATALENGTH来获取相关数据的真实字节计数。如果您有 unicode 数据,请注意,在这种情况下您获得的值将与文本的长度不同。

print(DATALENGTH(' ')) --1
print(LEN(' '))        --0

当涉及到表达式的相等性时,会像这样比较两个字符串的相等性:

  • 获取较短的字符串
  • 用空格填充直到长度等于较长字符串的长度
  • 比较两个字符串

这是导致意外结果的中间步骤- 在这一步之后,您可以有效地将空格与空格进行比较 - 因此它们被视为相等。

在“空白”情况下,LIKE 的表现比 = 更好,因为它不会对您尝试匹配的模式执行空白填充:

if '' = ' '
print 'eq'
else
print 'ne'

将给出 eq< /code> while:

if '' LIKE ' '
print 'eq'
else
print 'ne'

会给出 ne

但要小心 LIKE :它不是对称的:它将尾随空格视为模式(RHS)中的重要字符,但不认为匹配表达式(左)。以下内容摘自此处

declare @Space nvarchar(10)
declare @Space2 nvarchar(10)

set @Space = ''
set @Space2 = ' '

if @Space like @Space2
print '@Space Like @Space2'
else
print '@Space Not Like @Space2'

if @Space2 like @Space
print '@Space2 Like @Space'
else
print '@Space2 Not Like @Space'

@Space Not Like @Space2
@Space2 Like @Space

varchars and equality are thorny in TSQL. The LEN function says:

Returns the number of characters, rather than the number of bytes, of the given string expression, excluding trailing blanks.

You need to use DATALENGTH to get a true byte count of the data in question. If you have unicode data, note that the value you get in this situation will not be the same as the length of the text.

print(DATALENGTH(' ')) --1
print(LEN(' '))        --0

When it comes to equality of expressions, the two strings are compared for equality like this:

  • Get Shorter string
  • Pad with blanks until length equals that of longer string
  • Compare the two

It's the middle step that is causing unexpected results - after that step, you are effectively comparing whitespace against whitespace - hence they are seen to be equal.

LIKE behaves better than = in the "blanks" situation because it doesn't perform blank-padding on the pattern you were trying to match:

if '' = ' '
print 'eq'
else
print 'ne'

Will give eq while:

if '' LIKE ' '
print 'eq'
else
print 'ne'

Will give ne

Careful with LIKE though: it is not symmetrical: it treats trailing whitespace as significant in the pattern (RHS) but not the match expression (LHS). The following is taken from here:

declare @Space nvarchar(10)
declare @Space2 nvarchar(10)

set @Space = ''
set @Space2 = ' '

if @Space like @Space2
print '@Space Like @Space2'
else
print '@Space Not Like @Space2'

if @Space2 like @Space
print '@Space2 Like @Space'
else
print '@Space2 Not Like @Space'

@Space Not Like @Space2
@Space2 Like @Space
黑凤梨 2024-08-11 02:35:21

T-SQL 中的 = 运算符与其说是“等于”,不如说它是“根据表达式上下文的排序规则,是相同的单词/短语”,而 LEN 是“单词/短语中的字符数”。没有排序规则将尾随空格视为其前面的单词/短语的一部分(尽管它们确实将前导空格视为其前面的字符串的一部分)。

如果您需要区分“this”和“this”,则不应使用“相同的单词或短语”运算符,因为“this”和“this”是同一个单词。

= 工作方式的贡献在于字符串相等运算符应该依赖于其参数的内容和表达式的排序规则上下文,但它不应该依赖于参数的类型(如果它们都是字符串类型) 。

“这些是同一个单词”的自然语言概念通常不够精确,无法被 = 等数学运算符捕获,并且自然语言中没有字符串类型的概念。上下文(即排序规则)很重要(并且存在于自然语言中)并且是故事的一部分,而附加属性(有些看起来很奇怪)是 = 定义的一部分,以便使其在非自然世界中得到明确定义数据。

在类型问题上,当单词存储在不同的字符串类型中时,您不希望单词发生变化。例如,类型 VARCHAR(10)、CHAR(10) 和 CHAR(3) 都可以保存单词“cat”的表示,而 ? = 'cat' 应该让我们决定这些类型中的任何一个值是否包含单词 'cat' (大小写和重音问题由排序规则确定)。

对 JohnFx 评论的回应:

请参阅使用 char 和 varchar 数据在线图书中的。引用该页面,强调我的:

每个 char 和 varchar 数据值都有一个排序规则。排序规则定义
属性,例如用于表示每个字符的位模式,
比较规则,以及对大小写或重音的敏感性。

我同意它可能更容易找到,但它已记录在案。

还值得注意的是,SQL 的语义(其中 = 与现实世界的数据和比较的上下文有关(而不是与计算机上存储的位相对))长期以来一直是 SQL 的一部分。 RDBMS 和 SQL 的前提是真实世界数据的忠实表示,因此它对排序规则的支持早于类似思想(例如 CultureInfo)进入类 Algol 语言领域的许多年。这些语言的前提(至少直到最近)是解决工程问题,而不是管理业务数据。 (最近,类似语言在非工程应用程序(如搜索)中的使用正在取得一些进展,但 Java、C# 等仍在与它们的非商业根源作斗争。)

在我看来,因为 SQL 的原因而批评 SQL 是不公平的。与“大多数编程语言”不同。 SQL 旨在支持与工程非常不同的业务数据建模框架,因此语言是不同的(并且更适合其目标)。

哎呀,当 SQL 第一次被指定时,有些语言没有任何内置的字符串类型。在某些语言中,字符串之间的等于运算符根本不比较字符数据,而是比较引用!如果再过一两年,== 依赖于文化的想法成为常态,我不会感到惊讶。

The = operator in T-SQL is not so much "equals" as it is "are the same word/phrase, according to the collation of the expression's context," and LEN is "the number of characters in the word/phrase." No collations treat trailing blanks as part of the word/phrase preceding them (though they do treat leading blanks as part of the string they precede).

If you need to distinguish 'this' from 'this ', you shouldn't use the "are the same word or phrase" operator because 'this' and 'this ' are the same word.

Contributing to the way = works is the idea that the string-equality operator should depend on its arguments' contents and on the collation context of the expression, but it shouldn't depend on the types of the arguments, if they are both string types.

The natural language concept of "these are the same word" isn't typically precise enough to be able to be captured by a mathematical operator like =, and there's no concept of string type in natural language. Context (i.e., collation) matters (and exists in natural language) and is part of the story, and additional properties (some that seem quirky) are part of the definition of = in order to make it well-defined in the unnatural world of data.

On the type issue, you wouldn't want words to change when they are stored in different string types. For example, the types VARCHAR(10), CHAR(10), and CHAR(3) can all hold representations of the word 'cat', and ? = 'cat' should let us decide if a value of any of these types holds the word 'cat' (with issues of case and accent determined by the collation).

Response to JohnFx's comment:

See Using char and varchar Data in Books Online. Quoting from that page, emphasis mine:

Each char and varchar data value has a collation. Collations define
attributes such as the bit patterns used to represent each character,
comparison rules, and sensitivity to case or accenting.

I agree it could be easier to find, but it's documented.

Worth noting, too, is that SQL's semantics, where = has to do with the real-world data and the context of the comparison (as opposed to something about bits stored on the computer) has been part of SQL for a long time. The premise of RDBMSs and SQL is the faithful representation of real-world data, hence its support for collations many years before similar ideas (such as CultureInfo) entered the realm of Algol-like languages. The premise of those languages (at least until very recently) was problem-solving in engineering, not management of business data. (Recently, the use of similar languages in non-engineering applications like search is making some inroads, but Java, C#, and so on are still struggling with their non-businessy roots.)

In my opinion, it's not fair to criticize SQL for being different from "most programming languages." SQL was designed to support a framework for business data modeling that's very different from engineering, so the language is different (and better for its goal).

Heck, when SQL was first specified, some languages didn't have any built-in string type. And in some languages still, the equals operator between strings doesn't compare character data at all, but compares references! It wouldn't surprise me if in another decade or two, the idea that == is culture-dependent becomes the norm.

橙味迷妹 2024-08-11 02:35:21

我找到了这篇博客文章 描述了行为并解释了原因。

SQL 标准要求该字符串
比较,有效地,填充
带有空格字符的较短字符串。

这导致了令人惊讶的结果
N'' = N' ' (空字符串
等于一个或多个空格的字符串
字符),更一般地说是任何
字符串等于另一个字符串,如果它们
仅尾随空格不同。这
在某些情况下可能会出现问题。

更多信息还可参见 MSKB316626

I found this blog article which describes the behavior and explains why.

The SQL standard requires that string
comparisons, effectively, pad the
shorter string with space characters.

This leads to the surprising result
that N'' = N' ' (the empty string
equals a string of one or more space
characters) and more generally any
string equals another string if they
differ only by trailing spaces. This
can be a problem in some contexts.

More information also available in MSKB316626

傲鸠 2024-08-11 02:35:21

不久前有一个类似的问题,我研究了类似的问题这里

而不是LEN(' '),使用DATALENGTH(' ') - 这给你正确的值。

解决方案是使用 LIKE 子句,如我的答案中所述,和/或在 WHERE 子句中包含第二个条件来检查 DATALENGTH 也是。

阅读该问题和其中的链接。

There was a similar question a while ago where I looked into a similar problem here

Instead of LEN(' '), use DATALENGTH(' ') - that gives you the correct value.

The solutions were to use a LIKE clause as explained in my answer in there, and/or include a 2nd condition in the WHERE clause to check DATALENGTH too.

Have a read of that question and links in there.

梦一生花开无言 2024-08-11 02:35:21

要将值与文字空间进行比较,您还可以使用此技术作为 LIKE 语句的替代方法:

IF ASCII('') = 32 PRINT 'equal' ELSE PRINT 'not equal'

To compare a value to a literal space, you may also use this technique as an alternative to the LIKE statement:

IF ASCII('') = 32 PRINT 'equal' ELSE PRINT 'not equal'
月寒剑心 2024-08-11 02:35:21

有时,人们必须处理数据中的空格,无论是否有任何其他字符,即使使用 Null 的想法更好 - 但并不总是可用。
我确实遇到了所描述的情况并以这种方式解决了它:

... where ('>' + @space + '<') <> ('>' + @space2 + '<')

当然,对于大量数据,您不会这样做,但它可以快速轻松地处理数百行......

Sometimes one has to deal with spaces in data, with or without any other characters, even though the idea of using Null is better - but not always usable.
I did run into the described situation and solved it this way:

... where ('>' + @space + '<') <> ('>' + @space2 + '<')

Of course you wouldn't do that for large amount of data but it works quick and easy for some hundred lines ...

南薇 2024-08-11 02:35:21

正如 SQL - 92 8.2 比较谓词所说:

如果X的字符长度不等于长度
在 Y 的字符中,则较短的字符串实际上是
出于比较的目的,用以下副本替换
本身已延伸至较长的长度
在一个或多个填充字符右侧串联而成的字符串
角色,其中 pad 角色是根据 CS 选择的。如果
CS有NO PAD属性,那么pad字符就是一个
与任何字符不同的依赖于实现的字符
X 和 Y 字符集中的字符较少校对
比 CS 下的任何字符串。否则,填充字符是
<空格>.

As SQL - 92 8.2 comparison predicate saying:

If the length in characters of X is not equal to the length
in characters of Y, then the shorter string is effectively
replaced, for the purposes of comparison, with a copy of
itself that has been extended to the length of the longer
string by concatenation on the right of one or more pad char-
acters, where the pad character is chosen based on CS. If
CS has the NO PAD attribute, then the pad character is an
implementation-dependent character different from any char-
acter in the character set of X and Y that collates less
than any string under CS. Otherwise, the pad character is a
<space>.

萌能量女王 2024-08-11 02:35:21

如何在 sql server 上使用字段 char/varchar 来区分 select 上的记录:
示例:

declare @mayvar as varchar(10)

set @mayvar = 'data '

select mykey, myfield from mytable where myfield = @mayvar

预期

mykey (int) | myfield (varchar10)

1 | '数据'

获得

mykey | myfield

1 | '数据'
2 | 'data'

即使我写
select mykey, myfield from mytable where myfield = 'data'(最后没有空白)
我得到相同的结果。

我怎么解决的?在这种模式下:

select mykey, myfield
from mytable
where myfield = @mayvar 
and DATALENGTH(isnull(myfield,'')) = DATALENGTH(@mayvar)

如果 myfield 上有索引,则在每种情况下都会使用它。

我希望这会有所帮助。

How to distinct records on select with fields char/varchar on sql server:
example:

declare @mayvar as varchar(10)

set @mayvar = 'data '

select mykey, myfield from mytable where myfield = @mayvar

expected

mykey (int) | myfield (varchar10)

1 | 'data '

obtained

mykey | myfield

1 | 'data'
2 | 'data '

even if I write
select mykey, myfield from mytable where myfield = 'data' (without final blank)
I get the same results.

how I solved? In this mode:

select mykey, myfield
from mytable
where myfield = @mayvar 
and DATALENGTH(isnull(myfield,'')) = DATALENGTH(@mayvar)

and if there is an index on myfield, it'll be used in each case.

I hope it will be helpful.

娜些时光,永不杰束 2024-08-11 02:35:21

另一种方法是让它恢复到空间有价值的状态。
例如:用 _ 等已知字符替换空格

if REPLACE('hello',' ','_') = REPLACE('hello ',' ','_')
    print 'equal'
else
    print 'not equal'

返回:不等于

不理想,而且可能很慢,但在需要快速时是另一种快速前进的方法。

Another way is to put it back into a state that the space has value.
eg: replace the space with a character known like the _

if REPLACE('hello',' ','_') = REPLACE('hello ',' ','_')
    print 'equal'
else
    print 'not equal'

returns: not equal

Not ideal, and probably slow, but is another quick way forward when needed quickly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文