使用 LINQ 覆盖或忽略 C# 中未声明的实体
我有一个小实用程序,可以使用 LINQ 在 XML 文件中查找某些内容。它可以相当快速且很好地处理大量的数据集合。然而,某批文件中约有 20% 无法读取并被跳过,失败的原因是文件中存在度数符号 °
。这是“对未声明实体‘deg’的引用”。 上一个问题是关于的。
上一个问题中提供的解决方案不能直接应用于此处。我无权到处修改文件,制作它们的副本并替换实例或在副本中插入标签似乎效率低下。让 LINQ 忽略未声明的实体(这些实体对我的程序所做的事情绝对没有影响)的最佳方法是什么?或者是否有一种好的方法可以让 XDocument.Load 预先提供一些实体声明?
I have a little utility that runs through looking for certain things in XML files using LINQ. It processes a MASSIVE collection of them rather quickly and nicely. However, about 20% of a certain batch of files fail to be read and are skipped, failing because of the degree symbol's presence as °
in the files. This is the "Reference to undeclared entity 'deg'." a previous question was about.
The solutions offered in the previous question cannot be directly applied here. I am not at liberty to go around modifying the files, and making copies of them and replacing instances or inserting tags in the copies seems inefficient. What would be the best way to go about getting LINQ to ignore the undeclared entities, which have absolutely no bearing on what my program does anyway? Or is there perhaps a good way of getting an XDocument.Load to be fed some entity declarations beforehand?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不幸的是,实体构成了 XML 格式良好规则的一部分(2.1 格式良好的 XML文档)。看起来您是在说您希望 XDocument.Load 加载名义上的 XML 文件,但实际上不符合规则,而且它不会这样做,这是相当合理的。
如果您的用户向您传递的内容应该是 XML 文件,但其中包含未定义的实体,那么您必须让他们以有效格式提供文件,或者在加载时自行管理不正确性,方法如下:已被建议。
在我看来,根据您的限制,最巧妙的方法是遵循链接到的示例并创建一些
settings
沿着 ( 验证 DOM 中的 XML 文档)。如果存在未定义且未在公共模式中列出的实体,您需要创建自己的模式来定义您需要的所有实体。因此,为引用您自己的自定义架构的
XMLReader
创建通用设置
。当某些文件无法加载时,将必要的实体添加到此架构中,然后您将构建需要定义的所有实体的列表,以使 XML 文件有效。然后,对于您尝试加载的每个文档,使用上面的
settings
为该文件创建一个XMLReader
并调用 XDocument(XMLReader) 重载。Unfortunately entities form part of the well-formedness rules for XML (2.1 Well-Formed XML Documents). It seems like you're saying you want the
XDocument.Load
to load what is notionally an XML file, but does not in fact conform to the rules, which it won't do, quite reasonably.If your users are passing you what are supposed to be XML files, but that have undefined entities, then either you have to get them to provide the files in a valid format, or manage the incorrectness youself at load-time, in the ways that have been suggested.
It seems to me, from your restrictions, that the neatest approach would be to follow the example linked-to and create some
settings
to pass into theXMLReader
along the lines of (Validating an XML Document in the DOM).If there are entities which aren't defined and aren't listed in public schemas, you'll need to create your own schema which defines all the entities you need. So, create a generic
settings
for theXMLReader
which references your own, custom schema. Add the necessary entities to this schema as certain files fail to load and then you'll build up a list of all the entites that you need to define in order that the XML files are valid.Then, for each document you try to load, create an
XMLReader
for the file using thesettings
above and call the XDocument(XMLReader) overload.