我试图插入 XML 列(SQL SERVER 2008 R2),但服务器抱怨:
System.Data.SqlClient.SqlException(0x80131904):
XML解析:第1行,第39个字符,无法切换编码
我发现 XML 列必须是 UTF-16 才能成功插入。
我使用的代码是:
XmlSerializer serializer = new XmlSerializer(typeof(MyMessage));
StringWriter str = new StringWriter();
serializer.Serialize(str, message);
string messageToLog = str.ToString();
How can I serialize object to be in UTF-8 string?
编辑:好的,抱歉混淆了 - 字符串需要采用 UTF-8 格式。你是对的 - 默认情况下它是 UTF-16,如果我尝试插入 UTF-8 它会通过。那么问题就是如何序列化成UTF-8。
示例
这会在尝试插入 SQL Server 时导致错误:
<?xml version="1.0" encoding="utf-16"?>
<MyMessage>Teno</MyMessage>
这不会:
<?xml version="1.0" encoding="utf-8"?>
<MyMessage>Teno</MyMessage>
更新
我发现 SQL Server 2008 的 Xml
列类型需要 utf-8,并且当您尝试插入的 xml 规范的 encoding
属性中使用 utf-16 时:
当您想要添加 utf-8
时,则像这样向 SQL 命令添加参数:
sqlcmd.Parameters.Add("ParamName", SqlDbType.VarChar).Value = xmlValueToAdd;
如果您尝试在上一行中添加带有 encoding=utf-16
的 xmlValueToAdd,则会在插入时产生错误。此外,VarChar
意味着无法识别国家字符(它们显示为问号)。
要将 utf-16 添加到 db,请在前面的示例中使用 SqlDbType.NVarChar
或 SqlDbType.Xml
,或者根本不指定类型:
sqlcmd.Parameters.Add(new SqlParameter("ParamName", xmlValueToAdd));
I'm trying to insert into XML column (SQL SERVER 2008 R2), but the server's complaining:
System.Data.SqlClient.SqlException (0x80131904):
XML parsing: line 1, character 39, unable to switch the encoding
I found out that the XML column has to be UTF-16 in order for the insert to succeed.
The code I'm using is:
XmlSerializer serializer = new XmlSerializer(typeof(MyMessage));
StringWriter str = new StringWriter();
serializer.Serialize(str, message);
string messageToLog = str.ToString();
How can I serialize object to be in UTF-8 string?
EDIT: Ok, sorry for the mixup - the string needs to be in UTF-8. You were right - it's UTF-16 by default, and if I try to insert in UTF-8 it passes. So the question is how to serialize into UTF-8.
Example
This causes errors while trying to insert into SQL Server:
<?xml version="1.0" encoding="utf-16"?>
<MyMessage>Teno</MyMessage>
This doesn't:
<?xml version="1.0" encoding="utf-8"?>
<MyMessage>Teno</MyMessage>
Update
I figured out when the SQL Server 2008 for its Xml
column type needs utf-8, and when utf-16 in encoding
property of the xml specification you're trying to insert:
When you want to add utf-8
, then add parameters to SQL command like this:
sqlcmd.Parameters.Add("ParamName", SqlDbType.VarChar).Value = xmlValueToAdd;
If you try to add the xmlValueToAdd with encoding=utf-16
in the previous row it would produce errors in insert. Also, the VarChar
means that national characters aren't recognized (they turn out as question marks).
To add utf-16 to db, either use SqlDbType.NVarChar
or SqlDbType.Xml
in previous example, or just don't specify type at all:
sqlcmd.Parameters.Add(new SqlParameter("ParamName", xmlValueToAdd));
发布评论
评论(9)
这个问题几乎与其他两个问题重复,令人惊讶的是 - 虽然这是最新的问题 - 我相信它缺少最佳答案。
重复项以及我认为最好的答案是:
最后,声明或使用什么编码并不重要,只要
XmlReader
可以在应用程序服务器本地解析它。正如在 从 SQL Server 中的 XML 类型列读取 ADO.net 中的 XML 的最有效方法?,SQL Server 以高效的二进制格式存储 XML。通过使用
SqlXml
< /a> 类,ADO.net 可以以这种二进制格式与 SQL Server 进行通信,并且不需要数据库服务器对 XML 进行任何序列化或反序列化。这对于网络传输来说也应该更加高效。通过使用
SqlXml
,XML 将被预先解析后发送到数据库,然后数据库不需要了解任何有关字符编码(UTF-16 或其他编码)的信息。特别要注意的是,XML 声明甚至不会与数据库中的数据一起保留,无论使用哪种方法插入它。请参阅上面链接的答案,了解与此非常相似的方法,但这个示例是我的:
请注意,我不会认为最后一个(未注释的)示例是“生产就绪”的,而是将其保留为-是要简洁易读。如果操作正确,
StringReader
和创建的XmlReader
都应在using
语句中初始化,以确保它们的Close() 方法在完成时被调用。
据我所知,使用 XML 列时 XML 声明永远不会保留。例如,即使不使用.NET而仅使用这个直接的SQL插入语句,XML声明也不会与XML一起保存到数据库中:
现在就OP的问题而言,要序列化的对象仍然需要转换为为此,仍然需要来自
MyMessage
对象的 XML 结构和XmlSerializer
。但是,在最坏的情况下,消息可以不是序列化为字符串,而是序列化为XmlDocument
- 然后可以通过新的 SqlXml “http://msdn.microsoft.com/en-us/library/system.xml.xmlnodereader.aspx”rel="nofollow noreferrer">XmlNodeReader
- 避免反序列化/序列化行到一个字符串。 (请参阅http://blogs.msdn.com/b/jongallant/archive/2007/01/30/how-to-convert-xmldocument-to-xmlreader-for-sqlxml-data-type。 aspx 了解详细信息和示例。)此处的所有内容都是针对 .NET 4.0 和 SQL Server 2008 R2 进行开发和测试的。
请不要浪费通过额外的转换(反序列化和序列化 - 到 DOM、字符串或其他方式)运行 XML,如此处和其他地方的其他答案所示。
This question is a near-duplicate of 2 others, and surprisingly - while this one is the most recent - I believe it is missing the best answer.
The duplicates, and what I believe to be their best answers, are:
In the end, it doesn't matter what encoding is declared or used, as long as the
XmlReader
can parse it locally within the application server.As was confirmed in Most efficient way to read XML in ADO.net from XML type column in SQL server?, SQL Server stores XML in an efficient binary format. By using the
SqlXml
class, ADO.net can communicate with SQL Server in this binary format, and not require the database server to do any serialization or de-serialization of XML. This should also be more efficient for transport across the network.By using
SqlXml
, XML will be sent pre-parsed to the database, and then the DB doesn't need to know anything about character encodings - UTF-16 or otherwise. In particular, note that the XML declarations aren't even persisted with the data in the database, regardless of which method is used to insert it.Please refer to the above-linked answers for methods that look very similar to this, but this example is mine:
Note that I would not consider the last (non-commented) example to be "production-ready", but left it as-is to be concise and readable. If done properly, both the
StringReader
and the createdXmlReader
should be initialized withinusing
statements to ensure that theirClose()
methods are called when complete.From what I've seen, the XML declarations are never persisted when using an XML column. Even without using .NET and just using this direct SQL insert statement, for example, the XML declaration is not saved into the database with the XML:
Now in terms of the OP's question, the object to be serialized still needs to be converted into an XML structure from the
MyMessage
object, andXmlSerializer
is still needed for this. However, at worst, instead of serializing to a String, the message could instead be serialized to anXmlDocument
- which can then be passed toSqlXml
through a newXmlNodeReader
- avoiding a de-serialization/serialization trip to a string. (See http://blogs.msdn.com/b/jongallant/archive/2007/01/30/how-to-convert-xmldocument-to-xmlreader-for-sqlxml-data-type.aspx for details and an example.)Everything here was developed against and tested with .NET 4.0 and SQL Server 2008 R2.
Please don't make waste by running XML through extra conversions (de-deserializations and serializations - to DOM, strings, or otherwise), as shown in other answers here and elsewhere.
尽管 .net 字符串始终为
UTF-16
,但您需要使用UTF-16
编码来序列化对象。应该是这样的:
通过将编码设置为 Encoding.Unicode,不仅字符串将是
UTF-16
,而且您还应该将 xml 字符串设为UTF-16
。Although a .net string is always
UTF-16
you need to serialize the object usingUTF-16
encoding.That sould be something like this:
By setting the encoding to Encoding.Unicode not only the string will be
UTF-16
but you should also get the xml string asUTF-16
.最简单的解决方案不是告诉序列化器不要输出 XML 声明吗? .NET 和 SQL 应该对它们之间的其余部分进行排序。
Isn't the easiest solution to tell the serializer not to ouput the XML declaration? .NET and SQL should sort the rest out between them.
我花了很长时间才重新解决这个问题。
我正在对 SQL Server 执行 INSERT 语句,如下所示:
这给出了错误:
真正非常简单的修复方法是:
区别在于在 Unicode 字符串前添加
N
前缀:在前一种情况下,无前缀字符串被假定为 varchar(例如 Windows-1252 代码页)。当它在字符串中遇到
encoding="utf-16"
时,就会发生冲突(这是正确的,因为字符串不是 utf-16)。修复方法是将字符串作为 nvarchar(即 UTF-16)传递到 SQL Server:
这样,字符串是 UTF-16,它与 XML 所说的 utf-16 编码相匹配。可以说,地毯与窗帘相匹配。
It took me forever to re-solve this problem.
I was doing an
INSERT
statement into SQL Server as something like:and this gives the error:
And the really, very simple fix is to:
The difference is prefixing the Unicode string with
N
:In the former case, an unprefixed string is assumed to be varchar (e.g. Windows-1252 code-page). When it encounters the
encoding="utf-16"
inside the string, there is a conflict (and rightly so, since the string isn't utf-16).The fix is to pass the string to SQL server as an nvarchar (i.e. UTF-16):
That way the string is UTF-16, which matches the utf-16 encoding that the XML says it is. The carpet matches the curtains, so to speak.
@ziesemer 的答案(上面)是此问题以及此问题的链接重复项的唯一完全正确的答案。然而,它仍然需要更多的解释和澄清。将此视为@ziesemer 答案的扩展。
即使它们产生了预期的结果,这个问题的大多数答案(包括重复的问题)都是令人费解的,并且经历了许多不必要的步骤。这里的主要问题是总体上缺乏对
XML
数据类型在 SQL Server 中实际工作方式的了解(这并不奇怪,因为它没有很好的文档记录)。XML
类型:msdn
站点中的某处进行了记录)。优化包括:...
”以字符串形式占用27个字符(即54个字节),但存储在<中时只占用11个字符(即22个字节)。代码>XML 类型。这只是它的一个实例。多个实例占用 54 字节的额外倍数。但在 XML 类型中,每个实例仅占用该数字 ID 的空间,很可能是 4 字节 int。可以传入 8 位/非 UTF-16 数据。在这种情况下,您需要确保该字符串不是
NVARCHAR
字符串(即不以大写“N”作为文字前缀,在处理 T-SQL 变量时不声明为NVARCHAR
,并且在 .NET 中不声明为SqlDbType.NVarChar
)。并且,您需要确保您确实具有XML
声明,并且它指定了正确的编码。如您所见,当输入字符串为
NVARCHAR
时,可以包含XML声明,但必须为“UTF-16”。当输入字符串为
VARCHAR
时,可以包含 XML 声明,但不能为“UTF-16”。但是,它可以是任何有效的 8 位编码,在这种情况下,该编码的字节将转换为 UTF-16,如下所示:@ziesemer's answer (above) is the only fully correct answer to this question and the linked duplicates of this question. However, it could still use a little more explanation and some clarification. Consider this as an extension of @ziesemer's answer.
Even if they produce the desired result, most answers to this question (including the duplicate question) are convoluted and go through many unnecessary steps. The main issue here is the overall lack of understanding regarding how the
XML
datatype actually works in SQL Server (not surprising given that it isn't well documented). TheXML
type:msdn
site). The optimizations include:<ElementName>...</ElementName>
" takes up 27 character (i.e. 54 bytes) in string form, but only 11 characters (i.e. 22 bytes) when stored in theXML
type. And that is for a single instance of it. Multiple instances take up additional multiples of the 54 bytes. But in the XML type, each instance only takes up the space of that numeric ID, most likely a 4-byte int.Can have 8-bit / non-UTF-16 data passed in. In this case, you need to make sure that the string is not an
NVARCHAR
string (i.e. not prefixed with an upper-case "N" for literals, not declared asNVARCHAR
when dealing with T-SQL variables, and not declared asSqlDbType.NVarChar
in .NET). AND, you need to make sure that you do have theXML
declaration, and that it specifies the correct encoding.As you can see, when the input string is
NVARCHAR
, then the XML declaration can be included, but it needs to be "UTF-16".When the input string is
VARCHAR
then the XML declaration can be included, but it cannot be "UTF-16". It can, however, be any valid 8-bit encoding, in which case the bytes for that encoding will be converted into UTF-16, as shown below:The first example specifies the 4-byte UTF-8 sequence for Smiling Face with Sunglasses and it get converted correctly.
The second example uses 4 bytes to represent 4 Hebrew letters making up the word "Shalom", which is converted correctly, and displayed correctly given that the "F9" byte, which is first, is the
ש
character, which is on the right-side of the word (since Hebrew is a right-to-left language). Yet those same 4 bytes display asùìåí
when selected directly since the default Collation for the current DB isLatin1_General_100_CS_AS_SC
..NET 中的字符串始终为 UTF-16,因此只要您留在托管应用程序中,就不必关心它是什么编码。
问题更有可能出现在与 SQL 服务器对话的地方。您的问题没有显示该代码,因此很难找出确切的错误。我的建议是您检查是否可以在该代码上设置一个属性或属性来指定发送到服务器的数据的编码。
A string is always UTF-16 in .NET, so as long as you stay inside your managed app you don't have to care about which encoding it is.
The problem is more likely where you talk to the SQL server. Your question doesn't show that code so it's hard to pin point the exact error. My suggestion is you check if there's a property or attribute you can set on that code that specifies the encoding of the data sent to the server.
您正在序列化为字符串而不是字节数组,因此此时尚未发生任何编码。
“messageToLog”的开头是什么样的? XML 指定的编码(例如utf-8)是否随后被证明是错误的?
编辑
根据您的进一步信息,听起来字符串在传递到数据库时自动转换为utf-8,但数据库会阻塞,因为XML声明说它是utf-16。
在这种情况下,您不需要序列化为 utf-8。您需要使用 XML 中省略的“encoding=”进行序列化。 XmlFragmentWriter(不是 .Net 的标准部分,请 Google 一下)可以让您执行此操作。
You are serializing to a string rather than a byte array so, at this point, any encoding hasn't happened yet.
What does the start of "messageToLog" look like? Is the XML specifying an encoding (e.g. utf-8) which subsequently turns out to be wrong?
Edit
Based on your further info it sounds like the string is automatically converted to utf-8 when it is passed to the database, but the database chokes because the XML declaration says it is utf-16.
In which case, you don't need to serialize to utf-8. You need to serialize with the "encoding=" omitted from the XML. The XmlFragmentWriter (not a standard part of .Net, Google it) lets you do this.
xml 序列化程序的默认编码应为 UTF-16。只是为了确保你可以尝试 -
Default encoding for a xml serializer should be UTF-16. Just to make sure you can try -
试试这个:
删除 xml 文本中的编码,
如下所示
xmlTXT = xmlTXT.Replace("encoding = \ " utf-8 \ " ", "");
或者这个...
xmlTXT = xmlTXT.Replace("encoding="utf-8"", "");
try this:
delete encoding in xml text
sothing like this
xmlTXT = xmlTXT.Replace("encoding = \ " utf-8 \ " ", "");
or this...
xmlTXT = xmlTXT.Replace("encoding="utf-8"", "");