SQL Server SQL_Latin1_General_CP1_CI_AS 能否安全地转换为 Latin1_General_CI_AS?

发布于 2024-11-15 05:41:08 字数 231 浏览 1 评论 0原文

我们有一个旧数据库,其中一些(较旧的)列使用“SQL_Latin1_General_CP1_CI_AS”,最近的更改使用“Latin1_General_CI_AS”。

这是一个痛苦,因为连接需要额外的 COLLATE 语句才能工作。

我想把一切都带到“Latin1_General_CI_AS”。据我所知,它们或多或少是相同的排序规则,并且在此过程中我不会丢失数据......

有谁知道情况是否如此?

We have a legacy database with some (older) columns using "SQL_Latin1_General_CP1_CI_AS" and more recent changes have used "Latin1_General_CI_AS".

This is a pain as joins need the additional COLLATE statement to work.

I'd like to bring everything up to "Latin1_General_CI_AS". From what I can gather they are more or less identical collations and I won't lose data during this process...

Does anyone know if this is the case?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

当梦初醒 2024-11-22 05:41:08

这是更完整的答案:

https://www.olcot.co.uk/revised-difference- Between-collat​​ion-sql_latin1_general_cp1_ci_as-and-latin1_general_ci_as/

这些排序规则之间的主要区别在于它们如何应用字符扩展规则。某些拉丁字符可以扩展为多个字符。在处理非 unicode 文本时,SQL_xxxx 排序规则可能会忽略这些字符扩展,但将它们应用于 unicode 文本。因此:当使用一种排序规则与另一种排序规则时,连接、排序和比较可能会返回不同的结果。

示例:

Latin1_General_CI_AS 下,这两个语句返回同一组记录,因为 ß 扩展为 ss

SELECT * FROM MyTable3 WHERE Comments = 'strasse'
SELECT * FROM MyTable3 WHERE Comments = 'straße'

使用 SQL_Latin1_General_CP1_CI_AS 时,上述语句返回不同的记录,因为 ß 被视为与 ss 不同的字符。

Here is a more complete answer:

https://www.olcot.co.uk/revised-difference-between-collation-sql_latin1_general_cp1_ci_as-and-latin1_general_ci_as/

The key difference between these collations is in how they apply character expansion rules. Certain Latin characters may be expanded into multiple characters. The SQL_xxxx collations may ignore these character expansions when working with non-unicode text, but apply them for unicode text. As a result: joins, sorts, and comparisons may return different results when using one collation versus the other.

Example:

Under Latin1_General_CI_AS these two statements return the same set of records, as ß is expanded to ss.

SELECT * FROM MyTable3 WHERE Comments = 'strasse'
SELECT * FROM MyTable3 WHERE Comments = 'straße'

When using SQL_Latin1_General_CP1_CI_AS the above statements return different records, since the ß is treated as a different character than ss.

人生戏 2024-11-22 05:41:08

如果您要更改数据库的排序规则,那么您肯定应该了解一些内容,以便您可以做出相应的计划:

  • 关于数据丢失的可能性:

    • NVARCHAR 字段都是 Unicode,这是一个单一字符集,因此这些字段不会有任何数据丢失(这也包括也存储为 UTF-16 Little 的 XML 字段)字节序)。存储对象/列/索引/等名称的元数据字段都是NVARCHAR,因此无需担心这些。
    • VARCHAR 字段具有不同的排序规则,但不同排序规则之间的相同代码页不会出现问题,因为代码页是字符集。
    • VARCHAR 字段具有不同的排序规则并移动到不同的代码页(更改排序规则时)可能 如果正在使用的任何字符未在新的代码页。但是,这只是在物理更改特定字段的排序规则(如下所述)时出现的问题,并且在更改数据库的默认排序规则时不会发生。
  • 局部变量和字符串文字从数据库默认值获取其排序规则。更改数据库默认值将更改用于局部变量和字符串文字的排序规则。但是更改数据库的默认排序规则不会更改该数据库中表中现有字符串列所使用的排序规则。当将列与文字和/或变量进行比较或连接时,这通常不会导致任何问题,因为文字和变量将由于排序优先级而采用列的排序规则。唯一的潜在问题是 128 - 255 之间的值的字符可能会发生代码页转换,这些字符在列的排序规则使用的代码页中不可用。

  • 如果您期望列的谓词/比较/排序/串联等在更改数据库的默认排序规则时表现不同,那么您将需要使用以下命令显式更改该列的排序规则:

    更改表 [{table_name}]
       更改列 [{column_name}]
       {相同的数据类型}
       {same_NULL_or_NOT NULL_设置}
       整理 {name_of_Database_default_Collat​​ion};
    

    请务必指定当前正在使用的完全相同数据类型和NULL / NOT NULL设置,否则它们可能会恢复为如果还不是默认值,则为默认值。之后,如果任何字符串列上有任何索引刚刚更改了排序规则,那么您需要重建这些索引。

  • 更改数据库的默认排序规则将更改某些特定于数据库的元数据的排序规则,例如 sys.objectssys 中的 name 字段.columnssys.indexes 等。根据局部变量或字符串文字过滤这些系统视图不会成为问题,因为排序规则将在两侧发生变化。但是,如果您将任何本地系统视图联接到字符串字段上的临时表,并且本地数据库和 tempdb 之间的数据库级排序规则不匹配,那么您将得到“排序规则不匹配” “ 错误。下面将讨论这一点以及补救措施。

  • 这两种排序规则之间的一个区别在于它们如何对 VARCHAR 数据的某些字符进行排序(这不会影响 NVARCHAR 数据)。非 EBCDIC SQL_ 排序规则对 VARCHAR 数据使用所谓的“字符串排序”,而所有其他排序规则,甚至对 NVARCHAR 数据进行排序非 EBCDIC SQL_ 排序规则,使用所谓的“词排序”。不同之处在于,在“单词排序”中,破折号 - 和撇号 ' (也许还有其他一些字符?)的权重非常低,基本上会被忽略,除非字符串没有其他差异。要查看此行为的实际效果,请运行以下命令:

    声明 @Test TABLE (Col1 VARCHAR(10) NOT NULL);
    插入@Test VALUES ('aa');
    插入@Test VALUES ('ac');
    插入@Test VALUES('啊');
    插入@Test VALUES ('am');
    插入@Test VALUES('又名');
    插入@Test VALUES ('akc');
    插入@Test VALUES ('ar');
    插入@Test VALUES ('af');
    插入@Test VALUES ('a_e');
    插入@Test VALUES ('a''kb');
    
    SELECT * FROM @Test ORDER BY [Col1] COLLATE SQL_Latin1_General_CP1_CI_AS;
    --“字符串排序”将所有标点符号放在字母之前
    
    SELECT * FROM @Test ORDER BY [Col1] COLLATE Latin1_General_100_CI_AS;
    --“单词排序”大多忽略破折号和撇号
    

    返回:

    字符串排序
    ------------
    a'kb
    AF
    一个_e
    啊
    交流电
    啊
    又名
    AKC
    是
    阿尔
    

    和:

    词排序
    ---------
    一个_e
    啊
    交流电
    AF
    啊
    又名
    a'kb
    AKC
    是
    阿尔
    

    虽然您将“失去”“字符串排序”行为,但我不确定是否会将其称为“功能”。这种行为被认为是不可取的(事实证明它没有被引入任何 Windows 排序规则中)。但是,这两种排序规则之间存在明显的行为差异(同样,仅适用于非 EBCDIC VARCHAR 数据),并且您可能有基于代码和/或客户期望的代码和/或客户期望。基于“字符串排序”行为。 这需要测试您的代码,并可能需要研究一下这种行为变化是否会对用户产生负面影响。

  • SQL_Latin1_General_CP1_CI_ASLatin1_General_100_CI_AS< 之间的另一个区别/code> 是执行 VARCHAR 数据进行扩展NVARCHAR 数据已经可以对大多数 SQL_ 排序规则执行这些操作),例如将 æ 处理为 ae

    IF ('æ' COLLATE SQL_Latin1_General_CP1_CI_AS =
        'ae' 整理 SQL_Latin1_General_CP1_CI_AS)
    开始
      打印'SQL_Latin1_General_CP1_CI_AS';
    结尾;
    
    IF ('æ' COLLATE Latin1_General_100_CI_AS =
        'ae' COLLATE Latin1_General_100_CI_AS)
    开始
      打印“Latin1_General_100_CI_AS”;
    结尾;
    

    返回:

    Latin1_General_100_CI_AS
    

    您在这里“失去”的唯一一件事是无法进行这些扩展。一般来说,这是转向 Windows 排序规则的另一个好处。然而,就像从“字符串排序”到“单词排序”的移动一样,同样需要注意:两种排序规则之间存在明显的行为差异(同样,仅针对 VARCHAR 数据),并且您可能有基于这些映射的代码和/或客户期望。 这需要测试您的代码,并可能需要进行研究,看看这种行为变化是否会对用户产生负面影响。

    (首先在@Zarepheth的答案中指出,并在此处进行扩展)

  • 另一个区别(这也是迁移到 Windows 排序规则的好处)是过滤在 NVARCHAR 文字/变量/列上建立索引的 VARCHAR 列,您将不再使 VARCHAR 列上的索引无效。这是由于 Windows 排序规则对 VARCHARNVARCHAR 数据使用相同的 Unicode 排序和比较规则。由于两种类型之间的排序顺序相同,因此当 VARCHAR 数据转换为 NVARCHAR(由于数据类型优先级而显式或隐式)时,索引仍然有效。有关此行为的更多详细信息,请参阅我的帖子:混合 VARCHAR 和 NVARCHAR 类型时对索引的影响

  • 服务器级Collat​​ion用于设置系统数据库的Collat​​ion,包括[model][model] 数据库用作创建新数据库的模板,其中包括每次服务器启动时的[tempdb]。因此,如果数据库的默认排序规则与实例的默认排序规则不匹配,并且您将本地表连接到字符串字段上的临时表,那么您将收到排序规则不匹配错误。幸运的是,有一种简单的方法可以纠正执行 CREATE #TempTable 时的“当前”数据库与 [tempdb] 之间的排序规则差异。创建临时表时,使用 COLLATE 子句声明排序规则(在字符串列上),并使用特定排序规则(如果您知道数据库将始终使用该排序规则)或 DATABASE_DEFAULT (如果您并不总是知道将执行此代码的数据库的排序规则):

    CREATE TABLE #Temp (Col1 NVARCHAR(40) COLLATE DATABASE_DEFAULT);
    

    这对于表变量来说不是必需的,因为它们从“当前”数据库获取默认排序规则。但是,如果您同时拥有表变量和临时表,并将它们连接到字符串字段,那么您将需要使用 COLLATE {specic_collat​​ion}COLLATE DATABASE_DEFAULT,如上面直接所示。

  • 服务器级排序规则还控制局部变量名称、CURSOR 变量名称和GOTO 标签。虽然这些都不会受到本问题中处理的具体更改的影响,但至少需要注意一些事情。

  • 如果有多个版本可用,最好使用所需排序规则的最新版本。从 SQL Server 2005 开始,引入了“90”系列排序规则,SQL Server 2008 引入了“100”系列排序规则。您可以使用以下查询找到这些排序规则:

    从 sys.fn_helpcollat​​ions() 中选择 *,其中 [name] LIKE N'%[_]90[_]%'; -- 476
    
    SELECT * FROM sys.fn_helpcollat​​ions() WHERE [name] LIKE N'%[_]100[_]%'; -- 2686
    
  • 此外,虽然问题询问不区分大小写的排序规则,但应该注意的是,如果其他人想要进行类似的更改但正在使用区分大小写的排序规则,那么SQL Server 排序规则和 Windows 排序规则之间的另一个区别(仅适用于 VARCHAR 数据)是大小写先排序。这意味着,如果您同时拥有 Aa,则 SQL_ 排序规则会将 A 排序在 a< 之前/code>,而非 SQL_ 排序规则(以及处理 NVARCHAR 数据时的 SQL_ 排序规则)将对 a 进行排序代码>在A之前。

有关更改数据库或整个实例的排序规则的更多信息和详细信息,请参阅我的帖子:
更改所有用户数据库中实例、数据库和所有列的排序规则:可能会发生什么错了?

有关使用字符串和排序规则的更多信息,请访问:排序规则信息

If you are going to change the Collation of a Database, then there is definitely stuff you should know about so that you can plan accordingly:

  • Regarding data-loss potential:

    • NVARCHAR fields are all Unicode, which is a single character set, so there can't be any data loss for these fields (this also covers XML fields which are also stored as UTF-16 Little Endian). Meta-data fields that store the object / column / index / etc names are all NVARCHAR so no need to worry about those.
    • VARCHAR fields having different Collations but the same Code Page between the differing Collations will not be a problem since the Code Page is the character set.
    • VARCHAR fields having different Collations and moving to a different Code Page (when changing Collations) can have data loss if any of the characters being used are not represented in the new Code Page. HOWEVER, this is only an issue when physically changing the Collation of a particular field (described below) and would not happen upon changing the default Collation of a database.
  • Local variables and string literals get their Collation from the Database default. Changing the database default will change the Collation used for both local variables and string literals. But changing the Database's default Collation does not change the Collation used for existing string columns in the tables in that Database. This generally should not cause any problems when comparing or concatenating a column with a literal and/or variable since the literals and variables will take on the Collation of the column due to Collation Precedence. The only potential problem would be Code Page conversions that might occur for characters of values between 128 - 255 that are not available in the Code Page used by the Collation of the column.

  • If you are expecting a predicate / comparison / sort / concatenation / etc for a column to behave differently upon changing the Database's default Collation, then you will need to explicitly change that column's Collation using the following command:

    ALTER TABLE [{table_name}]
       ALTER COLUMN [{column_name}]
       {same_datatype}
       {same_NULL_or_NOT NULL_setting}
       COLLATE {name_of_Database_default_Collation};
    

    Be sure to specify the exact same datatype and NULL / NOT NULL setting that are currently being used, else they can revert to the default if not already being the default value. After that, if there are any indexes on any of the string columns that just had their Collation changed, then you need to rebuild those indexes.

  • Changing the Database's default Collation will change the Collation of certain database-specific meta-data, such as the name field in both sys.objects, sys.columns, sys.indexes, etc. Filtering these system Views against local variables or string literals won't be a problem since the Collation will be changing on both sides. But, if you JOIN any of the local system Views to temporary tables on string fields, and the Database-level Collation between the local database and tempdb doesn't match, then you will get the "Collation mismatch" error. This is discussed below along with the remedy.

  • One difference between these two Collations is in how they sort certain characters for VARCHAR data (this does not affect NVARCHAR data). The non-EBCDIC SQL_ Collations use what is called "String Sort" for VARCHAR data, while all other Collations, and even NVARCHAR data for the non-EBCDIC SQL_ Collations, use what is called "Word Sort". The difference is that in "Word Sort", the dash - and apostrophe ' (and maybe a few other characters?) are given a very low weight and are essentially ignored unless there are no other differences in the strings. To see this behavior in action, run the following:

    DECLARE @Test TABLE (Col1 VARCHAR(10) NOT NULL);
    INSERT INTO @Test VALUES ('aa');
    INSERT INTO @Test VALUES ('ac');
    INSERT INTO @Test VALUES ('ah');
    INSERT INTO @Test VALUES ('am');
    INSERT INTO @Test VALUES ('aka');
    INSERT INTO @Test VALUES ('akc');
    INSERT INTO @Test VALUES ('ar');
    INSERT INTO @Test VALUES ('a-f');
    INSERT INTO @Test VALUES ('a_e');
    INSERT INTO @Test VALUES ('a''kb');
    
    SELECT * FROM @Test ORDER BY [Col1] COLLATE SQL_Latin1_General_CP1_CI_AS;
    -- "String Sort" puts all punctuation ahead of letters
    
    SELECT * FROM @Test ORDER BY [Col1] COLLATE Latin1_General_100_CI_AS;
    -- "Word Sort" mostly ignores dash and apostrophe
    

    Returns:

    String Sort
    -----------
    a'kb
    a-f
    a_e
    aa
    ac
    ah
    aka
    akc
    am
    ar
    

    and:

    Word Sort
    ---------
    a_e
    aa
    ac
    a-f
    ah
    aka
    a'kb
    akc
    am
    ar
    

    While you will "lose" the "String Sort" behavior, I'm not sure that I would call that a "feature". It is a behavior that has been deemed undesirable (as evidenced by the fact that it wasn't brought forward into any of the Windows collations). However, it is a definite difference of behavior between the two collations (again, just for non-EBCDIC VARCHAR data), and you might have code and/or customer expectations based upon the "String Sort" behavior. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.

  • Another difference between SQL_Latin1_General_CP1_CI_AS and Latin1_General_100_CI_AS is the ability to do Expansions on VARCHAR data (NVARCHAR data can already do these for most SQL_ Collations), such as handling æ as if it were ae:

    IF ('æ' COLLATE SQL_Latin1_General_CP1_CI_AS =
        'ae' COLLATE SQL_Latin1_General_CP1_CI_AS)
    BEGIN
      PRINT 'SQL_Latin1_General_CP1_CI_AS';
    END;
    
    IF ('æ' COLLATE Latin1_General_100_CI_AS =
        'ae' COLLATE Latin1_General_100_CI_AS)
    BEGIN
      PRINT 'Latin1_General_100_CI_AS';
    END;
    

    Returns:

    Latin1_General_100_CI_AS
    

    The only thing you are "losing" here is not being able to do these expansions. Generally speaking, this is another benefit of moving to a Windows Collation. However, just like with the "String Sort" to "Word Sort" move, the same caution applies: it is a definite difference of behavior between the two collations (again, just for VARCHAR data), and you might have code and/or customer expectations based upon not having these mappings. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.

    (first noted in @Zarepheth's answer and expanded on here)

  • Another difference (that is also a benefit of moving to a Windows Collation) is that filtering a VARCHAR column that is indexed on NVARCHAR literal / variable / column you will no longer invalidate the index on the VARCHAR column. This is due to the Windows Collations using the same Unicode sorting and comparison rules for both VARCHAR and NVARCHAR data. Because the sort order is the same between the two types, when the VARCHAR data gets converted into NVARCHAR (explicitly or implicitly due to datatype precedence), the order of items in the index is still valid. For more details on this behavior, please see my post: Impact on Indexes When Mixing VARCHAR and NVARCHAR Types.

  • The server-level Collation is used to set the Collation of the system databases, which includes [model]. The [model] database is used as a template to create new databases, which includes [tempdb] upon each server startup. So, if the Database's default collation does not match the instance's default Collation and you join local tables to temporary tables on string fields, then you will get the Collation-mismatch error. Fortunately there is a somewhat easy way to correct for collation differences between the database that is "current" when CREATE #TempTable is executed and [tempdb]. When creating temporary tables, declare a collation (on string columns) using the COLLATE clause and use either a specific collation (if you know that the DB will always be using that collation), or DATABASE_DEFAULT (if you don't always know the collation of the DB where this code will execute):

    CREATE TABLE #Temp (Col1 NVARCHAR(40) COLLATE DATABASE_DEFAULT);
    

    This is not necessary for table variables since they get their default Collation from the "current" database. However, if you have both table variables and temporary tables and join them on string fields, then you will need to use COLLATE {specific_collation} or COLLATE DATABASE_DEFAULT as shown directly above.

  • The server-level collation also controls local variable names, CURSOR variable names, and GOTO labels. While none of these would be impacted by the specific change being dealt with in this Question, it is at least something to be aware of.

  • It is best to use the most recent version of the desired collation, if multiple versions are available. Starting in SQL Server 2005, a "90" series of collations was introduced, and SQL Server 2008 introduced a "100" series of collations. You can find these collations by using the following queries:

    SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]90[_]%'; -- 476
    
    SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]100[_]%'; -- 2686
    
  • ALSO, while the question asks about case-insensitive Collations, it should be noted that if someone else is looking to make a similar change but is using case-sensitive Collations, then another difference between SQL Server Collations and Windows Collations, for VARCHAR data only, is which case sorts first. Meaning, if you have both A and a, the SQL_ Collations will sort A before a, while the non-SQL_ Collations (and the SQL_ Collations when dealing with NVARCHAR data) will sort a before A.

For a lot more info and details on changing the Collation of a Database or of the entire Instance, please see my post:
Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?

For more info on working with strings and collations, please visit: Collations Info

过期以后 2024-11-22 05:41:08

此 MSDN 论坛上有更多信息:

http://social.msdn.microsoft.com/Forums/en-US/sqlgetstarted/thread/196b4586-1338-434d-ba8c-49fa3c9bdeeb/

其中指出:

如果排序规则是 SQL_Latin1_General_CP1_CI_AS 或 Latin1_General_CI_AS,您应该会发现差别不大,但两者都有比另一个更快或更慢的情况。

Latin1_General_CI_AS :- Latin1-通用,不区分大小写,重音-
敏感、假名不敏感、宽度不敏感

SQL_Latin1_General_CP1_CI_AS:- Latin1-General,不区分大小写,
Unicode 区分重音、不区分假名、不区分宽度
数据,对于非 Unicode 数据,代码页 1252 上的 SQL Server 排序顺序 52

因此,在我看来,您不应该看到差异,特别是如果您的数据仅为 a-z0-9

There is more info on this MSDN forum:

http://social.msdn.microsoft.com/Forums/en-US/sqlgetstarted/thread/196b4586-1338-434d-ba8c-49fa3c9bdeeb/

Which states:

You should see little difference if the collation is SQL_Latin1_General_CP1_CI_AS or Latin1_General_CI_AS, but both have instances where they are faster or slower than the other.

Latin1_General_CI_AS :- Latin1-General, case-insensitive, accent-
sensitive, kanatype-insensitive, width-insensitive

SQL_Latin1_General_CP1_CI_AS:- Latin1-General, case-insensitive,
accent-sensitive, kanatype-insensitive, width-insensitive for Unicode
Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data

Therefore in my opinion you shouldn't see a difference, especially if your data is only a-z0-9

习惯成性 2024-11-22 05:41:08
SELECT * FROM ::fn_helpcollations()
WHERE name IN (
'SQL_Latin1_General_CP1_CI_AS',
'Latin1_General_CI_AS'
)

...给出...

Latin1_General_CI_AS:
Latin1-General、不区分大小写、区分重音、不区分假名类型、不区分宽度

SQL_Latin1_General_CP1_CI_AS:
Latin1-General、不区分大小写、区分重音、不区分假名类型、不区分宽度对于 Unicode 数据,对于非 Unicode 数据,代码页 1252 上的 SQL Server 排序顺序 52

因此,我会推断使用的代码页是相同的 (Latin1-General => 1252),因此您应该不会遇到数据丢失 - 如果转换后有任何更改,则可能是排序顺序 - 这可能并不重要。

SELECT * FROM ::fn_helpcollations()
WHERE name IN (
'SQL_Latin1_General_CP1_CI_AS',
'Latin1_General_CI_AS'
)

...gives...

Latin1_General_CI_AS:
Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive

SQL_Latin1_General_CP1_CI_AS:
Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data

So from this, I would infer that the code page used is the same (Latin1-General => 1252), so you should encounter no loss of data - if anything were to change post-conversion it might be the sort order - which is probably immaterial.

a√萤火虫的光℡ 2024-11-22 05:41:08

为此,请转到数据库的属性并选择选项。

然后将集合类型更改为 SQL_Latin1_General_CP1_CS_AS。

输入图片此处描述

To do that go to properties of your data base and select options.

Then change the collection type to SQL_Latin1_General_CP1_CS_AS.

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文