为什么 Sql CE 会改变我的 unicode 值？

发布于 2024-10-17 07:59:23 字数 1720 浏览 3 评论 0原文

我正在从各种来源获取各种文档的非结构化文本。我正在使用 Sql Server Compact Edition 3.5。

我正在创建一个包含唯一单词的查找表，并通过 ID 作为标识列来引用它们。我发现问题似乎是unicode。

插入术语“定义”后，我将针对已经具有“定义”的行点击唯一键约束。如果仔细检查这两个字符，您会发现我插入的值不是“f”和“i”，而是实际上一个字符。然而，SqlCe 正试图将其转换为存在和“f”和“i”。一个单词有 10 个字符，另一个单词有 11 个字符，但 SqlCe 将它们视为相同。

表列是 nvarchar。

我将参数指定为 nvarchar。

查询非常简单：

            cmd.CommandText = "INSERT INTO dictionary(lemma) VALUES(?);";

            DbParameter lemma = cmd.CreateParameter();
            cmd.Parameters.Add(lemma);

            for (int i = 0; i < terms.Count; i++)
            {
                lemma.Value = terms[i].Key;
                cmd.ExecuteNonQuery();
            }

我也尝试过：

            cmd.CommandText = "INSERT INTO dictionary(lemma) VALUES(?);";

            SqlCeParameter lemma = new SqlCeParameter("lemma", SqlDbType.NVarChar);
            cmd.Parameters.Add(lemma);

            for (int i = 0; i < terms.Count; i++)
            {
                lemma.Value = terms[i].Key;
                cmd.ExecuteNonQuery();
            }

在我插入的内容中，“fi”字符的字节为 1 251，而不是“f”和“i”105 0, 110 0。请参阅以下内容：

 {byte[20]}
 [0]: 100
 [1]: 0
 [2]: 101
 [3]: 0
 [4]: 1
 [5]: 251
 [6]: 110
 [7]: 0
 [8]: 105
 [9]: 0
 [10]: 116
 [11]: 0
 [12]: 105
 [13]: 0
 [14]: 111
 [15]: 0
 [16]: 110
 [17]: 0
 [18]: 115
 [19]: 0

而数据库中的值（SqlCe 视为违反唯一键的值）是：

{byte[22]}
[0]: 100
[1]: 0
[2]: 101
[3]: 0
[4]: 102
[5]: 0
[6]: 105
[7]: 0
[8]: 110
[9]: 0
[10]: 105
[11]: 0
[12]: 116
[13]: 0
[14]: 105
[15]: 0
[16]: 111
[17]: 0
[18]: 110
[19]: 0
[20]: 115
[21]: 0

如何让 SQL Server CE 正确插入该值？

编辑：更正了上面显示的代码。

原文

I'm ingesting unstructured text of various documents from various sources. I'm using Sql Server Compact Edition 3.5.

I'm creating a lookup table with unique words and referencing them via an ID as an identity column. What I'm finding a problem with seems to be unicode.

Upon inserting the term "deﬁnitions" I'm hitting the unique key constraint against a row that already has "definitions". If you inspect the two closely, you'll find that the value I'm inserting, that is not an 'f' and an 'i', but in fact a single character. Yet, SqlCe is trying to convert it to being and 'f' and an 'i'. Ten characters in one word, eleven in the other, but SqlCe sees them as the same.

The table column is nvarchar.

I specified the parameter as nvarchar.

The query is very straightforward:

            cmd.CommandText = "INSERT INTO dictionary(lemma) VALUES(?);";

            DbParameter lemma = cmd.CreateParameter();
            cmd.Parameters.Add(lemma);

            for (int i = 0; i < terms.Count; i++)
            {
                lemma.Value = terms[i].Key;
                cmd.ExecuteNonQuery();
            }

I've also tried:

            cmd.CommandText = "INSERT INTO dictionary(lemma) VALUES(?);";

            SqlCeParameter lemma = new SqlCeParameter("lemma", SqlDbType.NVarChar);
            cmd.Parameters.Add(lemma);

            for (int i = 0; i < terms.Count; i++)
            {
                lemma.Value = terms[i].Key;
                cmd.ExecuteNonQuery();
            }

In what I'm inserting, the bytes for the 'ﬁ' character are 1 251, as opposed to 'f' and 'i' 105 0, 110 0.
See the following:

 {byte[20]}
 [0]: 100
 [1]: 0
 [2]: 101
 [3]: 0
 [4]: 1
 [5]: 251
 [6]: 110
 [7]: 0
 [8]: 105
 [9]: 0
 [10]: 116
 [11]: 0
 [12]: 105
 [13]: 0
 [14]: 111
 [15]: 0
 [16]: 110
 [17]: 0
 [18]: 115
 [19]: 0

Whereas the value in the database (the one SqlCe is seeing as a violation of a unique key) is:

{byte[22]}
[0]: 100
[1]: 0
[2]: 101
[3]: 0
[4]: 102
[5]: 0
[6]: 105
[7]: 0
[8]: 110
[9]: 0
[10]: 105
[11]: 0
[12]: 116
[13]: 0
[14]: 105
[15]: 0
[16]: 111
[17]: 0
[18]: 110
[19]: 0
[20]: 115
[21]: 0

How can I get SQL Server CE to insert the value correctly?

EDIT: Corrected the code shown above.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

怪异←思 2024-10-24 07:59:23

SQL CE 不会改变您的任何值。问题是您已将列定义为 IDENTITY，它只允许唯一值，而 SQL CE 通过使用排序规则来确定 VARCHARS 的唯一性。默认值是匹配具有文化意识的字符串，因此 'fi' = 'fi'、'Å' = 'Å' 等等。

我不知道有任何 SQL 排序规则可以将每个 Unicode 值进行不同的比较。如果这确实是您想要的，则必须将数据存储为 VarBinary 并执行二进制比较。

重新考虑您将字符与二进制进行比较的愿望，SQL 将 VarChar 定义为人类可读的文本，并且 SQL 和 Unicode 都指定了相应的连字、变音符号等。作为匹配字符串。这是有道理的，人们读起来确实是一样的，而且在大多数字体中它们是无法区分的。

回复收藏 0 原文

风柔一江水 2024-10-24 07:59:23

尝试

for (int i = 0; i < terms.Count; i++)
{
    cmd.CommandText = "INSERT INTO dictionary (lemma) VALUES (@lemma)";
    cmd.parameters.AddWithValue(@lemma, lemma);         
    cmd.ExecuteNonQuery();
}

Try

for (int i = 0; i < terms.Count; i++)
{
    cmd.CommandText = "INSERT INTO dictionary (lemma) VALUES (@lemma)";
    cmd.parameters.AddWithValue(@lemma, lemma);         
    cmd.ExecuteNonQuery();
}

回复收藏 0 原文

~没有更多了~