使用 RelaxNG 进行 XML 模式验证
您可以推荐哪些 XML 验证工具来提高性能和准确性,而这两个工具对于我们的系统来说都是一个关键问题? 我们有以下要求:
- 不是 xmllint(见下文)
- 支持 RelaxNG
- 可以轻松地与 Perl 集成(这是可选的,但它会很好)
为什么不是 xmllint? (这是背景知识,如果您愿意,可以跳过)
我们有一个大型 Perl 系统,它使用 RelaxNG 来验证我们的 XML。 我们使用 compact RelaxNG 格式 和 trang 将其转换为标准 RelaxNG 格式。 然后我们通过 xmllint 进行实际验证。
这就是问题出现的时候。xmllint 通常会出现错误报告验证错误的问题。 它不会给出误报或漏报,但如果文档无法验证,xmllint 通常会报告给定错误的错误元素或属性。 有时错误是正确的(“没想到会看到元素‘bar’),但只是因为未报告先前的错误(因为‘bar’应该遵循必需但缺少的元素‘foo’,但 xmllint 没有”请注意,这是 xmllint 的一个长期存在的问题,甚至最新版本也存在同样的问题。我们经常有大量的 XML 文档,错误报告错误会给客户和开发人员带来很大的痛苦。
Which XML validation tools can you recommend for both performance and accuracy, each of which is a critical issue on our system? We have the following requirements:
- It is not xmllint (see below)
- Supports RelaxNG
- Can easily integrate with Perl (this is optional, but it would be nice)
Why not xmllint? (This is background and you can skip it if you like)
We have a large Perl system which uses RelaxNG to validate our XML. We use the compact RelaxNG format and trang to convert it to the standard RelaxNG format. Then we do the actual validation via xmllint.
That's when the problems kick in. xmllint routinely has issues in reporting validation errors incorrectly. It doesn't give false positives or negatives, but if the document fails to validate, xmllint will often report the wrong element or attribute for a given error. Sometimes the error is correct ("did not expect to see element 'bar'), but only because a previous error was not reported (because 'bar' was supposed to be following the required but missing element 'foo', but xmllint doesn't tell us that bit). Note that this is a long-standing problem with xmllint and even the latest version has the same problems. We often have huge XML documents and misreporting the errors causes much grief for both clients and developers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我认为 JDrago 的想法是正确的,您需要避免使用基于 libxml2 的工具进行 RNG 验证,至少现在是这样。 我在我的项目中也发现了这一点。 我最近记录了两个有关 RNG 验证的 libxml2 错误。
我推荐jing。 它是由 Relax NG 的创建者和 XML 世界的领军人物之一 James Clark 编写的。 他也是 trang 的作者,您已经在使用它了。 这段代码(以及 trang)的开发最近在我上面链接到的 Google 代码网站上恢复了。
Jing 已证明我们的内容和模式始终正确,并且提供比 libxml2 更好的错误消息,尽管在这方面仍有很大的改进空间。
jing 相对于 libxml2/xmllint 的一个缺点是它目前不使用 OASIS XML 目录来解析公共和系统标识符以及指向模式的 URI。 如果您包含由“http”URI 引用的模式,这将是一个问题——这些模式始终通过网络获取。
I think that JDrago has the right idea, that you need to avoid libxml2-based tools for RNG validation, at least for now. I'm discovering this as well in my project. I recently logged two bugs against libxml2 concerning RNG validation.
I recommend jing. It was written by James Clark, the creator of Relax NG and one of the leading lights in the XML world. He is also the author of trang, which you are already using. Development of this code (and of trang) recently resumed at the Google Code site I link to above.
Jing has proved consistently correct with our content and schema, and to give much better error messages than libxml2, though there is still a lot of room for improvement in that regard.
The one shortcoming of jing vis a vis libxml2/xmllint is that it doesn't at present use OASIS XML catalogs to resolve public and system identifiers and URIs pointing to schemas. This would be an issue in case you have included schemas that are referred to by 'http' URI--those would always be fetched over the network.
Hamcrest Schema 允许您使用 Hamcrest 匹配器根据 RelaxNG 验证 XML 文档。
Hamcrest Schema allows you to validate XML documents against RelaxNG using Hamcrest Matchers.
我怀疑 xmllint 使用与其他库相同的底层库(libxml2 等)。 认为同一个库的另一个前端会给出不同的结果是违反直觉的。
I suspect xmllint uses the same underlying libraries (libxml2, etc) as anything else. It is counterintuitive to think that another front-end to the same library would give different results.
rnv 非常快,免费(如言论自由)并且在命令行上运行(所以Perl 可以轻松调用它)。 大多数时候,消息都没有问题。 不幸的是,它似乎不再维护。
rnv is very fast, free (as in free speech) and runs on the command line (so Perl can invoke it easily). Most of the times, the messages are OK. Unfortunately, it seems no longer maintained.
我是《RNV》的作者。 它在 sourceforge.net 上维护,并且有一个维护者负责 sourceforge 和 debian 软件包的构建。 事实上,代码没有改变是因为代码稳定。 没有报告任何错误。
i am the author of RNV. It is maintained on sourceforge.net, and there is a maintainer who takes care of both sourceforge and debian package builts. The fact is that the code is not changed is due to the code being stable. There are no bugs reported.