SQL Server SQL_Latin1_General_CP1_CI_AS 能否安全地转换为 Latin1_General_CI_AS?
我们有一个旧数据库,其中一些(较旧的)列使用“SQL_Latin1_General_CP1_CI_AS”,最近的更改使用“Latin1_General_CI_AS”。
这是一个痛苦,因为连接需要额外的 COLLATE 语句才能工作。
我想把一切都带到“Latin1_General_CI_AS”。据我所知,它们或多或少是相同的排序规则,并且在此过程中我不会丢失数据......
有谁知道情况是否如此?
We have a legacy database with some (older) columns using "SQL_Latin1_General_CP1_CI_AS" and more recent changes have used "Latin1_General_CI_AS".
This is a pain as joins need the additional COLLATE statement to work.
I'd like to bring everything up to "Latin1_General_CI_AS". From what I can gather they are more or less identical collations and I won't lose data during this process...
Does anyone know if this is the case?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这是更完整的答案:
https://www.olcot.co.uk/revised-difference- Between-collation-sql_latin1_general_cp1_ci_as-and-latin1_general_ci_as/
这些排序规则之间的主要区别在于它们如何应用字符扩展规则。某些拉丁字符可以扩展为多个字符。在处理非 unicode 文本时,SQL_xxxx 排序规则可能会忽略这些字符扩展,但将它们应用于 unicode 文本。因此:当使用一种排序规则与另一种排序规则时,连接、排序和比较可能会返回不同的结果。
示例:
在
Latin1_General_CI_AS
下,这两个语句返回同一组记录,因为ß
扩展为ss
。使用
SQL_Latin1_General_CP1_CI_AS
时,上述语句返回不同的记录,因为ß
被视为与ss
不同的字符。Here is a more complete answer:
https://www.olcot.co.uk/revised-difference-between-collation-sql_latin1_general_cp1_ci_as-and-latin1_general_ci_as/
The key difference between these collations is in how they apply character expansion rules. Certain Latin characters may be expanded into multiple characters. The SQL_xxxx collations may ignore these character expansions when working with non-unicode text, but apply them for unicode text. As a result: joins, sorts, and comparisons may return different results when using one collation versus the other.
Example:
Under
Latin1_General_CI_AS
these two statements return the same set of records, asß
is expanded toss
.When using
SQL_Latin1_General_CP1_CI_AS
the above statements return different records, since theß
is treated as a different character thanss
.如果您要更改数据库的排序规则,那么您肯定应该了解一些内容,以便您可以做出相应的计划:
关于数据丢失的可能性:
NVARCHAR
字段都是 Unicode,这是一个单一字符集,因此这些字段不会有任何数据丢失(这也包括也存储为 UTF-16 Little 的 XML 字段)字节序)。存储对象/列/索引/等名称的元数据字段都是NVARCHAR
,因此无需担心这些。VARCHAR
字段具有不同的排序规则,但不同排序规则之间的相同代码页不会出现问题,因为代码页是字符集。VARCHAR
字段具有不同的排序规则并移动到不同的代码页(更改排序规则时)可能 如果正在使用的任何字符未在新的代码页。但是,这只是在物理更改特定字段的排序规则(如下所述)时出现的问题,并且在更改数据库的默认排序规则时不会发生。局部变量和字符串文字从数据库默认值获取其排序规则。更改数据库默认值将更改用于局部变量和字符串文字的排序规则。但是更改数据库的默认排序规则不会更改该数据库中表中现有字符串列所使用的排序规则。当将列与文字和/或变量进行比较或连接时,这通常不会导致任何问题,因为文字和变量将由于排序优先级而采用列的排序规则。唯一的潜在问题是 128 - 255 之间的值的字符可能会发生代码页转换,这些字符在列的排序规则使用的代码页中不可用。
如果您期望列的谓词/比较/排序/串联等在更改数据库的默认排序规则时表现不同,那么您将需要使用以下命令显式更改该列的排序规则:
请务必指定当前正在使用的完全相同数据类型和
NULL
/NOT NULL
设置,否则它们可能会恢复为如果还不是默认值,则为默认值。之后,如果任何字符串列上有任何索引刚刚更改了排序规则,那么您需要重建这些索引。更改数据库的默认排序规则将更改某些特定于数据库的元数据的排序规则,例如
sys.objects
、sys 中的
、name
字段.columnssys.indexes
等。根据局部变量或字符串文字过滤这些系统视图不会成为问题,因为排序规则将在两侧发生变化。但是,如果您将任何本地系统视图联接到字符串字段上的临时表,并且本地数据库和tempdb
之间的数据库级排序规则不匹配,那么您将得到“排序规则不匹配” “ 错误。下面将讨论这一点以及补救措施。这两种排序规则之间的一个区别在于它们如何对
VARCHAR
数据的某些字符进行排序(这不会影响NVARCHAR
数据)。非 EBCDICSQL_
排序规则对VARCHAR
数据使用所谓的“字符串排序”,而所有其他排序规则,甚至对NVARCHAR
数据进行排序非 EBCDICSQL_
排序规则,使用所谓的“词排序”。不同之处在于,在“单词排序”中,破折号-
和撇号'
(也许还有其他一些字符?)的权重非常低,基本上会被忽略,除非字符串没有其他差异。要查看此行为的实际效果,请运行以下命令:返回:
和:
虽然您将“失去”“字符串排序”行为,但我不确定是否会将其称为“功能”。这种行为被认为是不可取的(事实证明它没有被引入任何 Windows 排序规则中)。但是,这两种排序规则之间存在明显的行为差异(同样,仅适用于非 EBCDIC
VARCHAR
数据),并且您可能有基于代码和/或客户期望的代码和/或客户期望。基于“字符串排序”行为。 这需要测试您的代码,并可能需要研究一下这种行为变化是否会对用户产生负面影响。SQL_Latin1_General_CP1_CI_AS
和Latin1_General_100_CI_AS< 之间的另一个区别/code> 是执行 对
VARCHAR
数据进行扩展(NVARCHAR
数据已经可以对大多数SQL_
排序规则执行这些操作),例如将æ
处理为ae
:返回:
您在这里“失去”的唯一一件事是无法进行这些扩展。一般来说,这是转向 Windows 排序规则的另一个好处。然而,就像从“字符串排序”到“单词排序”的移动一样,同样需要注意:两种排序规则之间存在明显的行为差异(同样,仅针对
VARCHAR
数据),并且您可能有基于不这些映射的代码和/或客户期望。 这需要测试您的代码,并可能需要进行研究,看看这种行为变化是否会对用户产生负面影响。(首先在@Zarepheth的答案中指出,并在此处进行扩展)
另一个区别(这也是迁移到 Windows 排序规则的好处)是过滤在
NVARCHAR
文字/变量/列上建立索引的VARCHAR
列,您将不再使VARCHAR
列上的索引无效。这是由于 Windows 排序规则对VARCHAR
和NVARCHAR
数据使用相同的 Unicode 排序和比较规则。由于两种类型之间的排序顺序相同,因此当VARCHAR
数据转换为NVARCHAR
(由于数据类型优先级而显式或隐式)时,索引仍然有效。有关此行为的更多详细信息,请参阅我的帖子:混合 VARCHAR 和 NVARCHAR 类型时对索引的影响。服务器级Collation用于设置系统数据库的Collation,包括
[model]
。[model]
数据库用作创建新数据库的模板,其中包括每次服务器启动时的[tempdb]
。因此,如果数据库的默认排序规则与实例的默认排序规则不匹配,并且您将本地表连接到字符串字段上的临时表,那么您将收到排序规则不匹配错误。幸运的是,有一种简单的方法可以纠正执行CREATE #TempTable
时的“当前”数据库与[tempdb]
之间的排序规则差异。创建临时表时,使用 COLLATE 子句声明排序规则(在字符串列上),并使用特定排序规则(如果您知道数据库将始终使用该排序规则)或 DATABASE_DEFAULT (如果您并不总是知道将执行此代码的数据库的排序规则):这对于表变量来说不是必需的,因为它们从“当前”数据库获取默认排序规则。但是,如果您同时拥有表变量和临时表,并将它们连接到字符串字段,那么您将需要使用
COLLATE {specic_collation}
或COLLATE DATABASE_DEFAULT
,如上面直接所示。服务器级排序规则还控制局部变量名称、
CURSOR
变量名称和GOTO
标签。虽然这些都不会受到本问题中处理的具体更改的影响,但至少需要注意一些事情。如果有多个版本可用,最好使用所需排序规则的最新版本。从 SQL Server 2005 开始,引入了“90”系列排序规则,SQL Server 2008 引入了“100”系列排序规则。您可以使用以下查询找到这些排序规则:
此外,虽然问题询问不区分大小写的排序规则,但应该注意的是,如果其他人想要进行类似的更改但正在使用区分大小写的排序规则,那么SQL Server 排序规则和 Windows 排序规则之间的另一个区别(仅适用于
VARCHAR
数据)是大小写先排序。这意味着,如果您同时拥有A
和a
,则SQL_
排序规则会将A
排序在a< 之前/code>,而非
SQL_
排序规则(以及处理NVARCHAR
数据时的SQL_
排序规则)将对a
进行排序代码>在A
之前。有关更改数据库或整个实例的排序规则的更多信息和详细信息,请参阅我的帖子:
更改所有用户数据库中实例、数据库和所有列的排序规则:可能会发生什么错了?
有关使用字符串和排序规则的更多信息,请访问:排序规则信息
If you are going to change the Collation of a Database, then there is definitely stuff you should know about so that you can plan accordingly:
Regarding data-loss potential:
NVARCHAR
fields are all Unicode, which is a single character set, so there can't be any data loss for these fields (this also covers XML fields which are also stored as UTF-16 Little Endian). Meta-data fields that store the object / column / index / etc names are allNVARCHAR
so no need to worry about those.VARCHAR
fields having different Collations but the same Code Page between the differing Collations will not be a problem since the Code Page is the character set.VARCHAR
fields having different Collations and moving to a different Code Page (when changing Collations) can have data loss if any of the characters being used are not represented in the new Code Page. HOWEVER, this is only an issue when physically changing the Collation of a particular field (described below) and would not happen upon changing the default Collation of a database.Local variables and string literals get their Collation from the Database default. Changing the database default will change the Collation used for both local variables and string literals. But changing the Database's default Collation does not change the Collation used for existing string columns in the tables in that Database. This generally should not cause any problems when comparing or concatenating a column with a literal and/or variable since the literals and variables will take on the Collation of the column due to Collation Precedence. The only potential problem would be Code Page conversions that might occur for characters of values between 128 - 255 that are not available in the Code Page used by the Collation of the column.
If you are expecting a predicate / comparison / sort / concatenation / etc for a column to behave differently upon changing the Database's default Collation, then you will need to explicitly change that column's Collation using the following command:
Be sure to specify the exact same datatype and
NULL
/NOT NULL
setting that are currently being used, else they can revert to the default if not already being the default value. After that, if there are any indexes on any of the string columns that just had their Collation changed, then you need to rebuild those indexes.Changing the Database's default Collation will change the Collation of certain database-specific meta-data, such as the
name
field in bothsys.objects
,sys.columns
,sys.indexes
, etc. Filtering these system Views against local variables or string literals won't be a problem since the Collation will be changing on both sides. But, if you JOIN any of the local system Views to temporary tables on string fields, and the Database-level Collation between the local database andtempdb
doesn't match, then you will get the "Collation mismatch" error. This is discussed below along with the remedy.One difference between these two Collations is in how they sort certain characters for
VARCHAR
data (this does not affectNVARCHAR
data). The non-EBCDICSQL_
Collations use what is called "String Sort" forVARCHAR
data, while all other Collations, and evenNVARCHAR
data for the non-EBCDICSQL_
Collations, use what is called "Word Sort". The difference is that in "Word Sort", the dash-
and apostrophe'
(and maybe a few other characters?) are given a very low weight and are essentially ignored unless there are no other differences in the strings. To see this behavior in action, run the following:Returns:
and:
While you will "lose" the "String Sort" behavior, I'm not sure that I would call that a "feature". It is a behavior that has been deemed undesirable (as evidenced by the fact that it wasn't brought forward into any of the Windows collations). However, it is a definite difference of behavior between the two collations (again, just for non-EBCDIC
VARCHAR
data), and you might have code and/or customer expectations based upon the "String Sort" behavior. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.Another difference between
SQL_Latin1_General_CP1_CI_AS
andLatin1_General_100_CI_AS
is the ability to do Expansions onVARCHAR
data (NVARCHAR
data can already do these for mostSQL_
Collations), such as handlingæ
as if it wereae
:Returns:
The only thing you are "losing" here is not being able to do these expansions. Generally speaking, this is another benefit of moving to a Windows Collation. However, just like with the "String Sort" to "Word Sort" move, the same caution applies: it is a definite difference of behavior between the two collations (again, just for
VARCHAR
data), and you might have code and/or customer expectations based upon not having these mappings. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.(first noted in @Zarepheth's answer and expanded on here)
Another difference (that is also a benefit of moving to a Windows Collation) is that filtering a
VARCHAR
column that is indexed onNVARCHAR
literal / variable / column you will no longer invalidate the index on theVARCHAR
column. This is due to the Windows Collations using the same Unicode sorting and comparison rules for bothVARCHAR
andNVARCHAR
data. Because the sort order is the same between the two types, when theVARCHAR
data gets converted intoNVARCHAR
(explicitly or implicitly due to datatype precedence), the order of items in the index is still valid. For more details on this behavior, please see my post: Impact on Indexes When Mixing VARCHAR and NVARCHAR Types.The server-level Collation is used to set the Collation of the system databases, which includes
[model]
. The[model]
database is used as a template to create new databases, which includes[tempdb]
upon each server startup. So, if the Database's default collation does not match the instance's default Collation and you join local tables to temporary tables on string fields, then you will get the Collation-mismatch error. Fortunately there is a somewhat easy way to correct for collation differences between the database that is "current" whenCREATE #TempTable
is executed and[tempdb]
. When creating temporary tables, declare a collation (on string columns) using theCOLLATE
clause and use either a specific collation (if you know that the DB will always be using that collation), orDATABASE_DEFAULT
(if you don't always know the collation of the DB where this code will execute):This is not necessary for table variables since they get their default Collation from the "current" database. However, if you have both table variables and temporary tables and join them on string fields, then you will need to use
COLLATE {specific_collation}
orCOLLATE DATABASE_DEFAULT
as shown directly above.The server-level collation also controls local variable names,
CURSOR
variable names, andGOTO
labels. While none of these would be impacted by the specific change being dealt with in this Question, it is at least something to be aware of.It is best to use the most recent version of the desired collation, if multiple versions are available. Starting in SQL Server 2005, a "90" series of collations was introduced, and SQL Server 2008 introduced a "100" series of collations. You can find these collations by using the following queries:
ALSO, while the question asks about case-insensitive Collations, it should be noted that if someone else is looking to make a similar change but is using case-sensitive Collations, then another difference between SQL Server Collations and Windows Collations, for
VARCHAR
data only, is which case sorts first. Meaning, if you have bothA
anda
, theSQL_
Collations will sortA
beforea
, while the non-SQL_
Collations (and theSQL_
Collations when dealing withNVARCHAR
data) will sorta
beforeA
.For a lot more info and details on changing the Collation of a Database or of the entire Instance, please see my post:
Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?
For more info on working with strings and collations, please visit: Collations Info
此 MSDN 论坛上有更多信息:
http://social.msdn.microsoft.com/Forums/en-US/sqlgetstarted/thread/196b4586-1338-434d-ba8c-49fa3c9bdeeb/
其中指出:
因此,在我看来,您不应该看到差异,特别是如果您的数据仅为 a-z0-9
There is more info on this MSDN forum:
http://social.msdn.microsoft.com/Forums/en-US/sqlgetstarted/thread/196b4586-1338-434d-ba8c-49fa3c9bdeeb/
Which states:
Therefore in my opinion you shouldn't see a difference, especially if your data is only a-z0-9
...给出...
Latin1_General_CI_AS:
Latin1-General、不区分大小写、区分重音、不区分假名类型、不区分宽度
SQL_Latin1_General_CP1_CI_AS:
Latin1-General、不区分大小写、区分重音、不区分假名类型、不区分宽度对于 Unicode 数据,对于非 Unicode 数据,代码页 1252 上的 SQL Server 排序顺序 52
因此,我会推断使用的代码页是相同的 (Latin1-General => 1252),因此您应该不会遇到数据丢失 - 如果转换后有任何更改,则可能是排序顺序 - 这可能并不重要。
...gives...
Latin1_General_CI_AS:
Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive
SQL_Latin1_General_CP1_CI_AS:
Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data
So from this, I would infer that the code page used is the same (Latin1-General => 1252), so you should encounter no loss of data - if anything were to change post-conversion it might be the sort order - which is probably immaterial.
为此,请转到数据库的属性并选择选项。
然后将集合类型更改为 SQL_Latin1_General_CP1_CS_AS。
To do that go to properties of your data base and select options.
Then change the collection type to SQL_Latin1_General_CP1_CS_AS.