我如何告诉 xalan 不要验证使用“文档”检索到的 XML功能?
昨天,Oracle 决定暂时关闭 java.sun.com。这把事情搞砸了,因为 xalan 尝试验证一些 XML,但无法检索properties.dtd。
我正在使用 xalan 2.7.1 运行一些 XSL 转换,但我不希望它验证任何内容。 所以尝试像这样加载XSL:
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
spf.setValidating(false);
XMLReader rdr = spf.newSAXParser().getXMLReader();
Source xsl = new SAXSource(rdr, new InputSource(xslFilePath));
Templates cachedXSLT = factory.newTemplates(xsl);
Transformer transformer = cachedXSLT.newTransformer();
transformer.transform(xmlSource, result);
在XSL本身中,我做了这样的事情:
<xsl:variable name="entry" select="document(concat($prefix, $locale_part, $suffix))/properties/entry[@key=$key]"/>
此代码检索的XML在顶部具有以下定义:
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<entry key="...
尽管上面的java代码指示解析器不验证,但它仍然发送请求到 java.sun.com。虽然 java.sun.com 不可用,但这会导致转换失败并显示以下消息:
Can not load requested doc: http://java.sun.com/dtd/properties.dtd
How do I get xalan to stop attempts to validate the XML returned from the "document" function?
Yesterday Oracle decided to take down java.sun.com for a while. This screwed things up for me because xalan tried to validate some XML but couldn't retrieve the properties.dtd.
I'm using xalan 2.7.1 to run some XSL transforms, and I don't want it to validate anything.
so tried loading up the XSL like this:
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
spf.setValidating(false);
XMLReader rdr = spf.newSAXParser().getXMLReader();
Source xsl = new SAXSource(rdr, new InputSource(xslFilePath));
Templates cachedXSLT = factory.newTemplates(xsl);
Transformer transformer = cachedXSLT.newTransformer();
transformer.transform(xmlSource, result);
in the XSL itself, I do something like this:
<xsl:variable name="entry" select="document(concat($prefix, $locale_part, $suffix))/properties/entry[@key=$key]"/>
The XML this code retrieves has the following definition at the top:
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<entry key="...
Despite the java code above instructing the parser to NOT VALIDATE, it still sends a request to java.sun.com. While java.sun.com is unavailable, this makes the transform fail with the message:
Can not load requested doc: http://java.sun.com/dtd/properties.dtd
How do I get xalan to stop trying to validate the XML loaded from the "document" function?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
该文档提到,即使未验证,解析器也可以读取 DTD,因为可能需要使用 DTD 来解析(扩展)实体。
由于我无法控制 XML 文档,因此我无法使用修改 XML 的选项。
我设法通过破坏解析器来阻止提取 DTD 文档的尝试,如下所示。
我的代码使用 DocumentBuilder 返回 Document (= DOM),但按照 OP 示例的 XMLReader 也有一个方法
setEntityResolver
,因此相同的技术应该适用于该方法。现在,这是我的假解析器:无论要求什么,它都会返回一个空的 InputStream。
或者,您的假解析器可能会返回作为本地资源或其他内容读取的实际文档流。
The documentation mentions that the parser may read the DTDs even if not validating, as it may become necessary to use the DTD to resolve (expand) entities.
Since I don't have control over the XML documents, nont's option of modifying the XML was not available to me.
I managed to shut down attempts to pull in DTD documents by sabotaging the resolver, as follows.
My code uses a DocumentBuilder to return a Document (= DOM) but the XMLReader as per the OP's example also has a method
setEntityResolver
so the same technique should work with that.Here, now, is my fake resolver: It returns an empty InputStream no matter what's asked of it.
Alternatively, your fake resolver could return streams of actual documents read as local resources or whatever.
请注意,如果 DTD 定义了 XML 文件所依赖的任何实体,则禁用 DTD 加载将导致解析失败。也就是说,要禁用 DTD 加载,请尝试此操作,它假设您使用的是 Java 附带的默认 Xerces。
如果您确实需要 DTD,那么另一种选择是实现本地 XML 目录,
您必须向该目录提供适当的 DTD 和 XML 目录定义。这篇维基百科文章和这篇文章很有帮助。
CatalogResolver 查看系统属性
xml.catalog.files
来确定要加载的目录。Be aware that disabling DTD loading will cause parsing to fail if the DTD defines any entities that your XML file depends on. That said, to disable DTD loading try this, which assumes you're using the default Xerces that ships with Java.
If you really need the DTD, then the other alternative is to implement a local XML catalog
To which you will have to provide the appropriate DTDs and an XML catalog definition. This Wikipedia Article and this article were helpful.
CatalogResolver looks at the system property
xml.catalog.files
to determine what catalogs to load.尝试在
SAXParserFactory
上使用 setFeature。试试这个:
我认为这应该足够了,否则尝试设置一些其他功能:
Try using setFeature on
SAXParserFactory
.Try this:
I think that should be enough, otherwise try setting a few other features:
我最终只是从 XML 中删除了 doctype 声明,因为其他方法都不起作用。当我有时间的时候,我会尝试这个:http://www.sagehill。 net/docbookxsl/UseCatalog.html#UsingCatsXalan
I just ended up stripping the doctype declaration out of the XML, because nothing else worked. When I get around to it, I'll try this: http://www.sagehill.net/docbookxsl/UseCatalog.html#UsingCatsXalan
抱歉,我已经找到了一个真正有效的解决方案,并决定我应该分享它。
1.
由于某种原因,setValidating(false) 不起作用。在某些情况下,它仍然下载外部 DTD 文件。为了防止这种情况,您应该按照建议附加一个自定义EntityResolver 此处:
每个外部实体请求都会调用 EntityResolver。返回 null 将不起作用,因为此后框架仍会从 Internet 下载文件。相反,您可以返回一个有效的 DTD 空流,如此处所建议:
2.
您将 setValidating(false) 告诉 SAX 解析器,它会读取您的 XSLT 代码。也就是说,它不会验证您的 XSLT。当它遇到 document() 函数时,它会使用另一个仍对其进行验证的解析器加载链接的 XML 文件,并下载外部实体。为了处理这个问题,您应该将自定义的 URIResolver 附加到转换器:
当转换器遇到 document() 函数时,它将调用您的 URIResolver 实现。您的实现必须返回传递的 URI 的 Source。最简单的事情是按照建议返回 StreamSource 这里。但在您的情况下,您应该自己解析文档,使用您已有的自定义 SAXParser(或每次创建一个新的)来防止验证和外部请求。
因此,您必须在代码中实现两个自定义接口。
Sorry for necroposting, but I have found a solution which actually works and decided I should share it.
1.
For some reason, setValidating(false) doesn't work. In some cases, it still downloads external DTD files. To prevent this, you should attach a custom EntityResolver as advised here:
The EntityResolver will be called for every external entity request. Returning null will not work because the framework will still download the file from the Internet after that. Instead, you can return an empty stream which is a valid DTD, as advised here:
2.
You are telling setValidating(false) to the SAX parser which reads your XSLT code. That is, it will not validate your XSLT. When it encounters a document() function, it loads the linked XML file using another parser which still validates it, and also downloads external entities. To handle this, you should attach a custom URIResolver to the transformer:
The transformer will call your URIResolver implementation when it encounters the document() function. Your implementation will have to return a Source for the passed URI. The simplest thing is to return a StreamSource as advised here. But in your case you should parse the document yourself, preventing validation and external requests using the customized SAXParser you already have (or create a new one each time).
So you will have to implement two custom interfaces in your code.