如何管理DTD URL并避免在运行时获取DTD URL引用?

发布于 2025-01-29 06:50:32 字数 1803 浏览 3 评论 0原文

我已经在不同站点运行的软件已有十多年了。该软件除其他库中使用了休眠和码头。这两个都有XML配置文件,这些文件具有带有http:// url的Doctype声明。是的,HTTP,因为10年前HTTPS并不是此类公共资源的默认值。

现在几个月前,由于其中一些URL消失了,该系统将不再构建。就像orntbay.org上的旧码头一样,或者在SourceForge等冬眠之类的东西。有趣的是,Mortbay.org的Jetty URL已有多年了,但这从来都不是问题。

但是有了冬眠,它已成为一个真正的问题。它似乎实际上确实出去了,并尝试将这些连接到这些Hibernate.org服务器,并且它们更改了两次,并且重定向链失败了,然后他们的证书更新了。这已成为一团糟!

这些DTD绝不应从互联网上抓住。坦率地说,我甚至都不知道我们的系统如何在一些从未计划过成功的呼吁的出口防火墙后面工作。

我知道在SGML和DTD的过去,有类似“目录”的东西,可以为这些DTD和其他进口实体提供服务。但这是在发明URL之前。

然后,我进入并将所有这些URL更改为类Path的URL(我有自己的协议处理程序),然后将这些DTD文件放入classPath中。在对所有这些文件进行了大量搜索和编辑之后,我设法使其正常工作。由于.DTD文件实际上在类路径中的某个地方。

例如,对于冬眠的人,我给出了:

classpath:/org/hibernate/hibernate-mapping-3.0.dtd

这有效。最初是:

http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd

该参考以某种方式破裂。 必须将其替换为

https://hibernate.org/dtd/hibernate-mapping-3.0.dtd

因此,由于某些PKIX SSL问题, 破产。终于(十多年后)将我的注意力引向了这个问题。

也许曾经有些东西抓住了http://.../hibernate-mapping-3.0.dtd,并用classpath从get/org/hibernate/hibernate/hibernate/hibernate-mapping-3.0.dtd中替换了它。但是,一旦URL方案从HTTP更改为https:也许只是猜测,这可能会崩溃。

因此,我想知道,这些DTD与HTTP一起采购的总体上应该如何工作?每次打开此类文件时,URL都可能会慢慢呼叫互联网?

当然,“取决于”您使用的XML解析器或处理器。但这就是这样,我们甚至不经常控制这一点。这些配置XML文件是用他们想要的任何东西读取的,因此我不要求某些特殊软件包的特殊解决方案。我问这应该如何工作?

使用模式名称空间这不是问题,这些URI实际上并未引用。要获取XSD文件,您可以指定XSI:示意图,但您不必这样做,无论如何它不等于名称空间URI。使用DTD,有些您甚至不需要,因为在许多情况下您不需要实例验证,但是您确实需要某些DTD,因为某些XML功能(例如ID和IDREF)甚至不起作用,除非基本的DTD属性定义具有已制作,然后有实体替代品等。因此,有时还需要DTD来处理XML文件。这就是这些URL出现的问题。

如果您想要一个问题的特定示例,请参见: Hibernate 3.5抛出” org.dom4j.documentException:文档第1行上的错误-3.0.dtd

I have software running at different sites for over a decade now. That software uses, among other libraries, Hibernate and Jetty. Both of those have XML configuration files that have a DOCTYPE declaration with an http:// URL. Yes, HTTP, because 10 years ago HTTPS wasn't the default for such public resources.

Now a few months ago the system wouldn't build any more because of some of these URLs having vanished. Like old Jetty at mortbay.org, or hibernate at sourceforge and things like that. Interestingly the JETTY URL at mortbay.org was defunct for years, but it was never a problem.

But with Hibernate it has become a real problem. It does seem to actually go out and try to make those connections to these hibernate.org servers, and they changed twice, and the chain of redirects failed somewhere, then their certificate got updated. It has become a holy mess!

These DTDs should never be grabbed from the internet. I frankly don't even know how our system could ever have worked behind some egress firewall that never planned for these calls to succeed.

I know in the old days of SGML and DTDs there was something like a "catalog" which would come and serve these DTDs and other imported entities. But that was before the URL was invented.

I then went in and changed all those URLs to classpath URLs (I have my own protocol handler for that) and I put these DTD files into the classpath. After a lot of searching and editing of all of these files, I managed to get it to work. Since the .dtd files are actually somewhere in the classpath.

For example, for the hibernate one I give:

classpath:/org/hibernate/hibernate-mapping-3.0.dtd

and that works. Initially that was:

http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd

and that reference broke somehow. So it had to be replaced with

https://hibernate.org/dtd/hibernate-mapping-3.0.dtd

which then broke because of some PKIX SSL problems. This has finally (after over 10 years) directed my attention to this problem.

Perhaps there once was something that caught the http://.../hibernate-mapping-3.0.dtd and replaced it with getting /org/hibernate/hibernate-mapping-3.0.dtd from the classpath. But that might have broken down as soon as the URL scheme was changed from http: to https:, maybe, just guessing.

So, I want to know how this was ever supposed to work in general with these DTDs being sourced with HTTPS? URLs potentially making slow calls to the internet every time such a file is opened?

And of course "it depends" on which XML parser or processor you use. But that's the thing, we don't often even control that. These configuration XML files are read with whatever thing they want, so I am not asking for what a special solution of some special package could be. I am asking for how this is supposed to work in general?

With Schema namespaces it's not a problem, these URIs aren't actually referenced. To get the XSD files, you can specify an xsi:schemaLocation but you don't have to and anyway it's not equal to the namespace URI. With DTDs, some you don't even need, because you don't need instance validation in many cases, but some DTDs you do need because certain XML functions, such as ID and IDREF do not even work unless a basic DTD attributes definitions have been made, and then there are entity replacements, etc. So DTDs are sometimes needed to process an XML file at all. And that's where the problem with these URLs comes in.

If you want a specific example for the problem see here: Hibernate 3.5 throws "org.dom4j.DocumentException: Error on line 1 of document http://www.hibernate.org/dtd/hibernate-mapping-3.0.dtd

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文