当前位置：文江博客话题详情

XDocument 的替代品

发布于 2024-11-04 01:40:16 字数 246 浏览 5 评论 0原文

大家好，XDocument 对我必须解析的 xml 提要之一非常挑剔，并且一直给我错误

'=' 是一个意外的标记。预期的标记是“;”。第 1 行，位置 576。

这基本上是 XDocument 对 XML 文档中松散的“=”符号的抱怨。

我对源 XML 文档没有任何控制权，因此我需要使用 XDocument 来忽略此错误，或者使用其他一些类。对其中任何一个有什么想法吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜唯美灬不弃 2024-11-11 01:40:16

如果文档不是格式良好的 XML（我的猜测是文档中包含“&=”或其他一些看起来实体的字符串），那么任何其他 XML 解析器都不太可能对此感到满意。您是否尝试过将文档加载到 IE 中，看看它是在那里解析还是粘贴到 XML 验证器？您还可以尝试 XmlDocument.Load() 并查看它是否在那里进行解析，这是下一个最接近的 XML 解析器（除了需要进行一些设置的 XmlReader）。

回复收藏 0 原文

同展鸳鸯锦 2024-11-11 01:40:16

它不会产生好的 XML，但如果您只需要加载一个错误的文档，那么 HTML Agility Pack 是一个很好的工具。它可以忽略许多使 HTML 成为非 XHTML 且不像 XML 的因素，因此错误的 XML 输入也可能会被解析。它所表达的对象模型类似于XmlDocument。例如

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.xml");

 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

，或者您可以使用 Agility Pack 清理 XML，然后将其干净的输出提供给真正的 XML 解析器以进行进一步处理。

这是一个快速而肮脏的技巧，我曾在一次性任务中使用过。不一定推荐使用正确的解决方案。

如果时间允许，我建议以某种方式格式化/修复错误的 XML 内容（例如，可能采用字符串形式，或使用其他工具），然后再将其提供给 XML 解析器。

It won't make for good XML, but if you need to just load up a bad document then the HTML Agility Pack is a good tool. It can overlook many of the things that make HTML not XHTML and not XML-like, so your erroneous XML input will likely be parsed too. The object model it expresses is similar to XmlDocument. e.g.

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.xml");

 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

Or you can use Agility Pack to clean up the XML and then feed its clean output to a real XML parser for further processing.

This is a quick and dirty trick that I've used for one-time tasks. It's not necessarily recommended over a proper solution.

What I would recommended if time permits is to somehow format/fix the erroneous XML content (e.g. maybe in its string form, or using another tool) before feeding it to an XML parser.

回复收藏 0 原文