德国文化中的 XML 解析问题 - ASP.NET

发布于 2024-10-29 06:34:15 字数 2765 浏览 1 评论 0 原文

编码平台：使用 C# 的 ASP.NET WebForms 4.0

背景：我正在从 XML 中读取一些值，并且一切都在我的区域设置（en-US）中运行。 XML 看起来像这个

<?xml version="1.0" encoding="utf-32" ?>
<settings>
  <UserRegistration>AutoAuthorize</UserRegistration>
  <OpenIDProfile>PromptUser</OpenIDProfile>
  <EnableSpamProtection>Yes</EnableSpamProtection>
  <MaxAllowedOpenID>2</MaxAllowedOpenID>
  <WebsiteURL>http://localhost:70707/blah/</WebsiteURL>
  <FacebookOAuthURL>https://graph.facebook.com/oauth/authorize?</FacebookOAuthURL>
  <FacebookAccessTokenURL>https://graph.facebook.com/oauth/access_token?</FacebookAccessTokenURL>
  <FacebookRedirectPage>ausgefüllt.aspx</FacebookRedirectPage>
  <FacebookAppID>192328104139846</FacebookAppID>
  <FacebookAppKey>29daeb58d8ae84cc22181f4073e4ed9d</FacebookAppKey>
  <FacebookAppSecret>b94e9ddd20efc47b3227e7333925fdd8</FacebookAppSecret>
  <FacebookScope>email</FacebookScope>
  <EmailSettingsDisplayName>admin</EmailSettingsDisplayName>
  <EmailSettingsEmail>[email protected]</EmailSettingsEmail>
  <EmailSettingsPassword>192185135098207157230060249027191124199097098215</EmailSettingsPassword>
</settings>

问题，

我将整个事情打包给我的客户进行测试。测试环境为

服务器：Windows Server 2008 R2 64位
区域设置：德语 (de-DE)

现在，当我尝试读取 XML 时，Elmah 会抛出两个错误。第一个错误是

System.Xml.XmlException: '另', 十六进制值 0xA000D，是无效字符。 1号线，位置 40.在System.Xml.XmlTextReaderImpl.Throw（字符串 res, String[] args) 在 System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace() 在 System.Xml.XmlTextReaderImpl.ParseDocumentContent() 在 System.Xml.Linq.XDocument.Load（XmlReader 读卡器、LoadOptions 选项）位于 System.Xml.Linq.XDocument.Load（字符串 uri、LoadOptions 选项）位于 Administrator_SiteSettings.SaveSettingsButton_Click（对象发送者、EventArgs e) 中 c:\Webs\ThirdPartyLogins\Administrator\SiteSettings.aspx.cs:line 48

我将这些 XML 节点值放入字典中，此错误后出现字典的键未找到错误。
编码是罪魁祸首吗？
我的代码可能有什么问题？

Update: Just read UTF-8, UTF-16, and UTF-32. Will changing to utf-8 help?

Update2: Two things that might clarify the issue more.

1）将编码更改为utf-16时，出现新错误

在 utf-16 处其 System.Xml.XmlException： '.'，十六进制值 0x00，是一个无效字符。 1号线，位置 39.

2) 之前粘贴的 XML 不完整。它还有更多节点，其中一些 URL 作为节点数据。这会是一个问题吗？还更新了 XML。

原文

Coding Platform: ASP.NET WebForms 4.0 with C#

Background: I am reading some values from XML and everything was working in my locale (en-US). The XML looks like this

<?xml version="1.0" encoding="utf-32" ?>
<settings>
  <UserRegistration>AutoAuthorize</UserRegistration>
  <OpenIDProfile>PromptUser</OpenIDProfile>
  <EnableSpamProtection>Yes</EnableSpamProtection>
  <MaxAllowedOpenID>2</MaxAllowedOpenID>
  <WebsiteURL>http://localhost:70707/blah/</WebsiteURL>
  <FacebookOAuthURL>https://graph.facebook.com/oauth/authorize?</FacebookOAuthURL>
  <FacebookAccessTokenURL>https://graph.facebook.com/oauth/access_token?</FacebookAccessTokenURL>
  <FacebookRedirectPage>ausgefüllt.aspx</FacebookRedirectPage>
  <FacebookAppID>192328104139846</FacebookAppID>
  <FacebookAppKey>29daeb58d8ae84cc22181f4073e4ed9d</FacebookAppKey>
  <FacebookAppSecret>b94e9ddd20efc47b3227e7333925fdd8</FacebookAppSecret>
  <FacebookScope>email</FacebookScope>
  <EmailSettingsDisplayName>admin</EmailSettingsDisplayName>
  <EmailSettingsEmail>[email protected]</EmailSettingsEmail>
  <EmailSettingsPassword>192185135098207157230060249027191124199097098215</EmailSettingsPassword>
</settings>

Problem

I wrapped the whole thing to my client for testing. The testing environment is

Server: Windows Server 2008 R2 64 bit
Locale: German (de-DE)

And now, when I try to read the XML, Elmah throws two errors error. The first error is

System.Xml.XmlException: '????',
hexadecimal value 0xA000D, is an
invalid character. Line 1, position
40. at System.Xml.XmlTextReaderImpl.Throw(String
res, String[] args) at
System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace()
at
System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at
System.Xml.Linq.XDocument.Load(XmlReader
reader, LoadOptions options) at
System.Xml.Linq.XDocument.Load(String
uri, LoadOptions options) at
Administrator_SiteSettings.SaveSettingsButton_Click(Object
sender, EventArgs e) in
c:\Webs\ThirdPartyLogins\Administrator\SiteSettings.aspx.cs:line
48

I am taking these XML node values to a Dictionary and this error follows with a key not found error for the dictionary.
Is encoding the culprit?
What could be wrong in my code?

Update: Just read UTF-8, UTF-16, and UTF-32.
Will changing to utf-8 help?

Update2: Two things that might clarify the issue more.

1) On changing the encoding to utf-16, got a new error

at utf-16 its System.Xml.XmlException:
'.', hexadecimal value 0x00, is an
invalid character. Line 1, position
39.

2) The XML pasted earlier was not complete. It had some more nodes with some URL as node data. Will that be an issue? Have updated XML also.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

昔日梦未散 2024-11-05 06:34:15

简短的回答：是的，编码是罪魁祸首；正确的编码是utf-16。

长答案：线索在于异常文本，其中显示“十六进制值 0xA000D”和“第 1 行，位置 40”。

当 XmlReader 读取文件时，它首先读取 XML 声明（ 和 ?> 之间的所有内容）以确定文件其余部分使用哪种编码。在本例中，声明显示 UTF-32。因此，在读取声明末尾的 > 字符后，它会立即切换到使用 UTF-32 编码。正如您的链接文章所解释的，UTF-32 使用 4 个字节来表示每个字符，因此 XmlReader 从文件中读取接下来的 4 个字节并尝试将它们解释为字符。（这与您的错误消息一致，因为第 1 行位置 40 紧接在 > 字符之后。）

如果文件确实是 UTF-32，那么接下来的 4 个字节是什么？好吧，文件中 > 字符之后的下一个内容是换行符，它由两个字符组成：回车符和换行符（在 Unicode 中分别为 0D 和 0A）。因此，我们预计接下来的 4 个字节是 0D 00 00 00，接下来的 4 个字节是 0A 00 00 00（请记住，Windows 是小端字节序）。

但正如错误消息所述，实际读取的“字符”是 A000D，这意味着接下来的 4 个字节是 0D 00 0A 00（再次记住小端）。这非常接近，但显然每个字符只使用 2 个字节，而不是 4 个。那么我们有一个名称，不是吗？它被称为UTF-16！