SQL Server 排序规则选择
今晚我花了很多时间试图找到有关在 SQL Server 2008 R2 安装中应用哪种排序规则的指南,但几乎所有在线内容基本上都在说“选择适合您的排序规则”。 极其没有帮助。
我的背景是新应用程序开发。我不担心与先前版本的 SQL Server(即 <= 2005)的向后兼容性。我对存储代表全球语言的数据非常感兴趣 - 不仅仅是拉丁语。我在网上找到的很少的帮助表明我应该避免所有“SQL_”排序规则。这将我的选择范围缩小到使用基于 Windows 区域设置的二进制或“非二进制”排序规则。
如果我使用二进制,我认为我应该使用“BIN2”。这就是我的问题。如何确定是否应该使用 BIN2 还是仅使用“Latin1_General_100_XX_XX_XX”?我的蜘蛛感知告诉我,BIN2 将提供“不太准确”的排序规则,但对所有语言都更通用(而且速度更快!)。我还怀疑二进制排序规则区分大小写、区分重音和区分假名(是吗?)。相反,我怀疑非二进制排序规则最适合基于拉丁语的语言。
该文档不支持我的上述主张,我正在做出有根据的猜测。但这就是问题所在!为什么在线文档如此薄弱,以至于只能靠猜测来选择?甚至“SQL Server 2008 Internals”一书也讨论了各种选择,但没有解释为什么以及何时选择二进制排序规则(与非二进制 Windows 排序规则相比)。犯罪!!!
I've spent a lot of time this evening trying to find guidance about which choice of collation to apply in my SQL Server 2008 R2 installation, but almost everything online basically says "choose what is right for you." Extremely unhelpful.
My context is new application development. I am not worrying about backward compatibility with a prior version of SQL Server (viz. <= 2005). I am very interested in storing data representing languages from around the globe - not just Latin based. What very little help I've found online suggests I should avoid all "SQL_" collations. This narrows my choice to using either a binary or "not binary" collation based on the Windows locale.
If I use binary, I gather I should use "BIN2." So this is my question. How do I determine whether I should use BIN2 or just "Latin1_General_100_XX_XX_XX"? My spider-sense tells me that BIN2 will provide collation that is "less accurate," but more generic for all languages (and fast!). I also suspect the binary collation is case sensitive, accent sensitive, and kana-sensitive (yes?). In contrast, I suspect the non-binary collation would work best for Latin-based languages.
The documentation doesn't support my claims above, I'm making educated guesses. But this is the problem! Why is the online documentation so thin that the choice is left to guesswork? Even the book "SQL Server 2008 Internals" discussed the variety of choices, without explaining why and when binary collation would be chosen (compared with non-binary windows collation). Criminy!!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
恕我直言,“SQL Server 2008 Internals”对这个主题进行了很好的讨论。
二进制排序规则很棘手,如果你打算支持人类的文本搜索,你最好选择非二进制排序规则。如果您已经调整了其他所有内容(首先是架构),并且在区分大小写和区分重音是所需行为的情况下(例如密码哈希),则二进制文件可以很好地获得一点性能。二进制排序规则实际上“更精确”,因为它不考虑相似的文本。不过,您从中得到的排序顺序仅适用于机器。
SQL_* 排序规则和本机 Windows 排序规则之间只有细微的差别。如果您不受兼容性的限制,请选择本机的,因为据我所知,它们是前进的方向。
排序规则决定排序顺序和相等性。您可以选择真正最适合您的用户的内容。据了解,您将为数据使用 unicode 类型(如 nvarchar)以支持国际文本。排序规则会影响非 unicode 列中可以存储的内容,但这不会影响您。
真正重要的是避免在 WHERE 子句中混合排序规则,因为这就是不使用索引而付出罚款的地方。据我所知,没有支持所有语言的银弹排序规则。您可以为大多数用户选择一个,也可以为每种语言提供不同的本地化支持栏。
一件重要的事情是让服务器排序规则与数据库排序规则相同。如果您计划使用临时表作为临时表(如果使用“CREATE TABLE #ttt...”创建)选择服务器排序规则,那么您的生活会变得更加轻松,并且您会遇到需要解决的排序规则冲突指定显式排序规则。这也会对性能产生影响。
"SQL Server 2008 Internals" has a good discussion on the topic imho.
Binary collation is tricky, if you intend to support text search for human beings, you'd better go with non-binary. Binary is good to gain a tiny bit of performance if you have tuned everything else (architecture first) and in cases where case sensitivity and accent sensitivity are a desired behavior, like password hashes for instance. Binary collation is actually "more precise" in a sense that it does not consider similar texts. The sort orders you get out of there are good for machines only though.
There is only a slight difference between the SQL_* collations and the native windows ones. If you're not constrained with compatibility, go for the native ones as they are the way forward afaik.
Collation decides sort order and equality. You choose, what really best suits your users. It's understood that you will use the unicode types (like nvarchar) for your data to support international text. Collation affects what can be stored in a non-unicode column, which does not affect you then.
What really matters is that you avoid mixing collations in WHERE clause because that's where you pay the fine by not using indexes. Afaik there's no silver bullet collation to support all languages. You can either choose one for the majority of your users or go into localization support with different column for each language.
One important thing is to have the server collation the same as your database collation. It will make your life much easier if you plan to use temporary tables as temporary tables if created with "CREATE TABLE #ttt..." pick up the server collation and you'd run into collation conflicts which you'll need to solve with specifying an explicit collation. This has a performance impact too.
请不要认为我的答案是完整的,但您应该考虑以下几点:(
nvarchar
数据类型。这将允许您存储任何语言的任何字符,如UTF-8\unicode
字符集所定义!如果不这样做,您将无法在表格中混合不同来源的文本(拉丁语、西里尔语、阿拉伯语等)。也就是说,您的排序规则选择将主要影响以下方面:
这个(空格被视为“第一等级”字符)?
或者这个(排序时不考虑空格)?
Please do not consider my answer as complete, but you should take into consideration the following points:
nvarchar
data type. This will allow you to store any character from any language, as defined byUTF-8\unicode
character set! If you do not do so, you will not be able to mix text from different origins (latin, cyrillic, arabic, etc) in your tables.This said, your collation choice will mainly affect the following:
this one (spaces are considered as 'first rank' characters)?
or this one (spaces are not considered in the ordering)?
全球数据库(例如网站)的最佳默认排序规则可能是
Latin1_General_CI_AS
。比排序规则更重要的是确保所有文本列都使用nvarchar
数据类型。The best default collation for a global database (e.g. a website) is probably
Latin1_General_CI_AS
. More important than collation is making sure that all textual columns use thenvarchar
data type.只要您使用 NVARCHAR 列(对于混合国际数据应该如此),所有 *_BIN 和 *_BIN2 排序规则都会基于 Unicode 代码点执行相同的二进制比较/排序。您选择哪一个并不重要。 Latin1_General_BIN2 看起来是一个合理的通用选择。
来源: http://msdn.microsoft.com/ en-us/library/ms143350(v=sql.105).aspx
As long as you use NVARCHAR columns (as you should for mixed international data), all *_BIN and *_BIN2 collations perform the same binary comparison/sorting based on the Unicode code points. It doesn't matter which one you pick. Latin1_General_BIN2 looks like a reasonable generic choice.
Source: http://msdn.microsoft.com/en-us/library/ms143350(v=sql.105).aspx