解码 XML 中的扩展字符

发布于 2024-08-17 06:42:35 字数 513 浏览 2 评论 0原文

我知道这可能很简单并且之前可能已经被问过,但我很难找到解决方案。

我正在解析一些 RSS 提要,其中包含 HTML 作为 CDATA 块。一个示例如下:http://g.msn.com/1ewenus50/news2

变化很大,但里面几乎总是有一些扩展的字符。例如,如果我制作一个简单的控制台应用程序并使用 WebClient.DownloadString 并查看结果,我会看到类似

“在圣诞节当天飞行途中获悉涉嫌未遂 253 航班轰炸机的极端主义链接。NBC” ™s Savannah Guthrie 报道。(今日秀)”

然而,那些奇怪的字符应该是撇号、引号、破折号等。

让这些正确解码的技巧是什么?

如果还不清楚,我将使用 C# / .NET 来实现此目的。最后,此内容将在 Silverlight 中呈现,但我也在完整的 .NET 3.5 运行时中看到了该问题。

I know this is probably simple and has probably been asked before, but I'm having trouble coming up with a solution.

I am parsing some RSS feeds which include HTML as CDATA blocks. One example is here: http://g.msn.com/1ewenus50/news2

The feed changes a lot, but there are almost always some extended characters in it. For example if I make a simple console app and use WebClient.DownloadString and look at the result, I see things like

"learned of the alleged attempted Flight 253 bomber’s extremist links while he was mid-flight on Christmas Day. NBC’s Savannah Guthrie reports. (Today Show)"

However those weird characters should be apostrophes, quote marks, em dashes, etc.

What is the trick for getting these to decode correctly?

If it wasn't clear, I'm using C# / .NET for this. In the end this content will be rendered in Silverlight, but I'm seeing the issue in the full .NET 3.5 runtime as well.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小矜持 2024-08-24 06:42:35

二进制形式下载并将其解析为 XML。 应该是正确的 - XML 文档应该在编码方面是自我描述的,但我不会将它放在某些网络服务器上以将其(在标头中)宣传为具有不同的编码,这会让 DownloadString 感到困惑。

一般来说,当涉及 XML 时,值得在 XML API 中尽可能多地进行操作,而不是使用原始数据。

Download it in binary form and parse it as XML. That should get it right - the XML document should be self-describing in terms of the encoding, but I wouldn't put it past some webservers to advertise it (in headers) as having a different encoding, which would confuse DownloadString.

In general, when XML is involved it's worth doing as much as possible within an XML API rather than with the raw data.

苏大泽ㄣ 2024-08-24 06:42:35

您可能使用了错误的文本编码...我不确定您使用的是哪一种或哪一种是正确的,但这可能会让您走上正轨。

You are probably using the wrong text encoding... I'm not sure which one you are using or which is the right one, but this might put you on the path.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文