编码平台:使用 C# 的 ASP.NET WebForms 4.0
背景:我正在从 XML 中读取一些值,并且一切都在我的区域设置(en-US)中运行。 XML 看起来像这个
<?xml version="1.0" encoding="utf-32" ?>
<settings>
<UserRegistration>AutoAuthorize</UserRegistration>
<OpenIDProfile>PromptUser</OpenIDProfile>
<EnableSpamProtection>Yes</EnableSpamProtection>
<MaxAllowedOpenID>2</MaxAllowedOpenID>
<WebsiteURL>http://localhost:70707/blah/</WebsiteURL>
<FacebookOAuthURL>https://graph.facebook.com/oauth/authorize?</FacebookOAuthURL>
<FacebookAccessTokenURL>https://graph.facebook.com/oauth/access_token?</FacebookAccessTokenURL>
<FacebookRedirectPage>ausgefüllt.aspx</FacebookRedirectPage>
<FacebookAppID>192328104139846</FacebookAppID>
<FacebookAppKey>29daeb58d8ae84cc22181f4073e4ed9d</FacebookAppKey>
<FacebookAppSecret>b94e9ddd20efc47b3227e7333925fdd8</FacebookAppSecret>
<FacebookScope>email</FacebookScope>
<EmailSettingsDisplayName>admin</EmailSettingsDisplayName>
<EmailSettingsEmail>[email protected]</EmailSettingsEmail>
<EmailSettingsPassword>192185135098207157230060249027191124199097098215</EmailSettingsPassword>
</settings>
问题,
我将整个事情打包给我的客户进行测试。测试环境为
服务器:Windows Server 2008 R2 64位
区域设置:德语 (de-DE)
现在,当我尝试读取 XML 时,Elmah 会抛出两个错误。第一个错误是
System.Xml.XmlException: '另',
十六进制值 0xA000D,是
无效字符。 1号线,位置
40.在System.Xml.XmlTextReaderImpl.Throw(字符串
res, String[] args) 在
System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace()
在
System.Xml.XmlTextReaderImpl.ParseDocumentContent()
在
System.Xml.Linq.XDocument.Load(XmlReader
读卡器、LoadOptions 选项)位于
System.Xml.Linq.XDocument.Load(字符串
uri、LoadOptions 选项)位于
Administrator_SiteSettings.SaveSettingsButton_Click(对象
发送者、EventArgs e) 中
c:\Webs\ThirdPartyLogins\Administrator\SiteSettings.aspx.cs:line
48
我将这些 XML 节点值放入字典中,此错误后出现字典的键未找到错误。
编码是罪魁祸首吗?
我的代码可能有什么问题?
Update: Just read
UTF-8, UTF-16, and UTF-32.
Will changing to utf-8 help?
Update2: Two things that might clarify the issue more.
1)将编码更改为utf-16时,出现新错误
在 utf-16 处其 System.Xml.XmlException:
'.',十六进制值 0x00,是一个
无效字符。 1号线,位置
39.
2) 之前粘贴的 XML 不完整。它还有更多节点,其中一些 URL 作为节点数据。这会是一个问题吗?还更新了 XML。
Coding Platform: ASP.NET WebForms 4.0 with C#
Background: I am reading some values from XML and everything was working in my locale (en-US). The XML looks like this
<?xml version="1.0" encoding="utf-32" ?>
<settings>
<UserRegistration>AutoAuthorize</UserRegistration>
<OpenIDProfile>PromptUser</OpenIDProfile>
<EnableSpamProtection>Yes</EnableSpamProtection>
<MaxAllowedOpenID>2</MaxAllowedOpenID>
<WebsiteURL>http://localhost:70707/blah/</WebsiteURL>
<FacebookOAuthURL>https://graph.facebook.com/oauth/authorize?</FacebookOAuthURL>
<FacebookAccessTokenURL>https://graph.facebook.com/oauth/access_token?</FacebookAccessTokenURL>
<FacebookRedirectPage>ausgefüllt.aspx</FacebookRedirectPage>
<FacebookAppID>192328104139846</FacebookAppID>
<FacebookAppKey>29daeb58d8ae84cc22181f4073e4ed9d</FacebookAppKey>
<FacebookAppSecret>b94e9ddd20efc47b3227e7333925fdd8</FacebookAppSecret>
<FacebookScope>email</FacebookScope>
<EmailSettingsDisplayName>admin</EmailSettingsDisplayName>
<EmailSettingsEmail>[email protected]</EmailSettingsEmail>
<EmailSettingsPassword>192185135098207157230060249027191124199097098215</EmailSettingsPassword>
</settings>
Problem
I wrapped the whole thing to my client for testing. The testing environment is
Server: Windows Server 2008 R2 64 bit
Locale: German (de-DE)
And now, when I try to read the XML, Elmah throws two errors error. The first error is
System.Xml.XmlException: '????',
hexadecimal value 0xA000D, is an
invalid character. Line 1, position
40. at System.Xml.XmlTextReaderImpl.Throw(String
res, String[] args) at
System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace()
at
System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at
System.Xml.Linq.XDocument.Load(XmlReader
reader, LoadOptions options) at
System.Xml.Linq.XDocument.Load(String
uri, LoadOptions options) at
Administrator_SiteSettings.SaveSettingsButton_Click(Object
sender, EventArgs e) in
c:\Webs\ThirdPartyLogins\Administrator\SiteSettings.aspx.cs:line
48
I am taking these XML node values to a Dictionary and this error follows with a key not found error for the dictionary.
Is encoding the culprit?
What could be wrong in my code?
Update: Just read
UTF-8, UTF-16, and UTF-32.
Will changing to utf-8 help?
Update2: Two things that might clarify the issue more.
1) On changing the encoding to utf-16, got a new error
at utf-16 its System.Xml.XmlException:
'.', hexadecimal value 0x00, is an
invalid character. Line 1, position
39.
2) The XML pasted earlier was not complete. It had some more nodes with some URL as node data. Will that be an issue? Have updated XML also.
发布评论
评论(1)
简短的回答:是的,编码是罪魁祸首;正确的编码是utf-16。
长答案:线索在于异常文本,其中显示“十六进制值 0xA000D”和“第 1 行,位置 40”。
当 XmlReader 读取文件时,它首先读取 XML 声明(
和
?>
之间的所有内容)以确定文件其余部分使用哪种编码。在本例中,声明显示 UTF-32。因此,在读取声明末尾的>
字符后,它会立即切换到使用 UTF-32 编码。正如您的链接文章所解释的,UTF-32 使用 4 个字节来表示每个字符,因此 XmlReader 从文件中读取接下来的 4 个字节并尝试将它们解释为字符。 (这与您的错误消息一致,因为第 1 行位置 40 紧接在>
字符之后。)如果文件确实是 UTF-32,那么接下来的 4 个字节是什么?好吧,文件中
>
字符之后的下一个内容是换行符,它由两个字符组成:回车符和换行符(在 Unicode 中分别为 0D 和 0A)。因此,我们预计接下来的 4 个字节是 0D 00 00 00,接下来的 4 个字节是 0A 00 00 00(请记住,Windows 是 小端字节序)。但正如错误消息所述,实际读取的“字符”是 A000D,这意味着接下来的 4 个字节是 0D 00 0A 00(再次记住小端)。这非常接近,但显然每个字符只使用 2 个字节,而不是 4 个。那么我们有一个名称,不是吗?它被称为UTF-16!
Short answer: Yes, the encoding is the culprit; the correct encoding is utf-16.
Long answer: The clue lies in the exception text, where it says "hexidecimal value 0xA000D" and "line 1, position 40".
When XmlReader reads your file, it first reads the XML declaraction (everything between
<?xml
and?>
) to determine which encoding to use for the rest of the file. In this case the declaration says UTF-32. So immediately after reading the>
character at the end of the declaration, it switches to using UTF-32 encoding. As your linked article explains, UTF-32 uses 4 bytes to represent each character, so the XmlReader reads the next 4 bytes from the file and tries to interpret them as a character. (This lines up with your error message, since line 1 position 40 is immediately after the>
character.)If the file really were UTF-32, what would the next 4 bytes be? Well, the next thing in the file after the
>
character is a newline, which is made up of two characters, carriage return and linefeed (in Unicode, 0D and 0A respectively). So we would expect the next 4 bytes to be 0D 00 00 00, and the next 4 after that would be 0A 00 00 00 (remember, Windows is little-endian).But as the error message states, the actual "character" read was A000D, which means the next 4 bytes were 0D 00 0A 00 (again, remember little-endian). That's pretty close, but apparently only 2 bytes are being used for each character instead of 4. Well we have a name for that, don't we? It's called UTF-16!