XDocument 的替代品
大家好,XDocument 对我必须解析的 xml 提要之一非常挑剔,并且一直给我错误
'=' 是一个意外的标记。预期的标记是“;”。第 1 行,位置 576。
这基本上是 XDocument 对 XML 文档中松散的“=”符号的抱怨。
我对源 XML 文档没有任何控制权,因此我需要使用 XDocument 来忽略此错误,或者使用其他一些类。对其中任何一个有什么想法吗?
Hey guys, XDocument is being very finicky with one of the xml feeds I have to parse, and keeps giving me the error
'=' is an unexpected token. The expected token is ';'. Line 1, position 576.
Which is basically XDocument crying about a loose "=" sign in the XML document.
I don't have any control over the source XML document, so I need to either get XDocument to ignore this error, or use some other class. Any ideas on either one?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果文档不是格式良好的 XML(我的猜测是文档中包含“&=”或其他一些看起来实体的字符串),那么任何其他 XML 解析器都不太可能对此感到满意。您是否尝试过将文档加载到 IE 中,看看它是在那里解析还是粘贴到 XML 验证器?您还可以尝试
XmlDocument.Load()
并查看它是否在那里进行解析,这是下一个最接近的 XML 解析器(除了需要进行一些设置的 XmlReader)。If the document isn't well-formed XML (and my guess is that you have '&=' in the document or some other entity-looking string) then it's unlikely that any other XML parsers are going to be any happier with it. Have you tried loading the document in, say, IE to see if it parses there or pasted to an XML validator? You can also just try
XmlDocument.Load()
and see if it parses there, that's the next closest XML parser (aside from XmlReader which takes a little bit of setting up).它不会产生好的 XML,但如果您只需要加载一个错误的文档,那么 HTML Agility Pack 是一个很好的工具。它可以忽略许多使 HTML 成为非 XHTML 且不像 XML 的因素,因此错误的 XML 输入也可能会被解析。它所表达的对象模型类似于XmlDocument。例如
,或者您可以使用 Agility Pack 清理 XML,然后将其干净的输出提供给真正的 XML 解析器以进行进一步处理。
这是一个快速而肮脏的技巧,我曾在一次性任务中使用过。不一定推荐使用正确的解决方案。
如果时间允许,我建议以某种方式格式化/修复错误的 XML 内容(例如,可能采用字符串形式,或使用其他工具),然后再将其提供给 XML 解析器。
It won't make for good XML, but if you need to just load up a bad document then the HTML Agility Pack is a good tool. It can overlook many of the things that make HTML not XHTML and not XML-like, so your erroneous XML input will likely be parsed too. The object model it expresses is similar to XmlDocument. e.g.
Or you can use Agility Pack to clean up the XML and then feed its clean output to a real XML parser for further processing.
This is a quick and dirty trick that I've used for one-time tasks. It's not necessarily recommended over a proper solution.
What I would recommended if time permits is to somehow format/fix the erroneous XML content (e.g. maybe in its string form, or using another tool) before feeding it to an XML parser.
看看这个问题的答案: 解析XML/XHTML 文档,但忽略 C# 中的错误
我认为最好的选择是在 try/catch 块中解析它,删除 catch 块内的违规块,然后重新解析。
Take a look at the answers of this question: Parsing an XML/XHTML document but ignoring errors in C#
The best option I believe is to parse it in a try/catch block, remove the offending block inside the catch block, and re-parse.