使用Java解析XML文件和文件路径中的空格

发布于 2024-07-27 18:33:38 字数 4846 浏览 3 评论 0原文

我的文件系统(Windows XP 上)上有文件。 我想使用 Java (JRE 1.6) 解析它们。

问题是,我不明白当文件路径中有空格时 Java 和 Xerces 如何协同工作。

如果文件路径中没有空格,则一切正常。

如果有空格,即使我使用 FileInputStream 实例调用解析器,我也可能会遇到这种麻烦:(

java.net.UnknownHostException: .
    at java.net.PlainSocketImpl.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at sun.net.NetworkClient.doConnect(Unknown Source)
    at sun.net.NetworkClient.openServer(Unknown Source)
    at sun.net.ftp.FtpClient.openServer(Unknown Source)
    at sun.net.ftp.FtpClient.openServer(Unknown Source)
    at sun.net.www.protocol.ftp.FtpURLConnection.connect(Unknown Source)
    at sun.net.www.protocol.ftp.FtpURLConnection.getInputStream(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)

sun.net.ftp.FtpClient.openServer ?? ?Wtf?)

否则会遇到这种麻烦:(

java.net.MalformedURLException: unknown protocol: d
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)

它说未知协议:d,因为我猜该文件位于D驱动器上。)

有谁知道为什么会发生这种情况以及如何发生吗 ?规避问题? 我尝试提供自己的 EntityResolver 但我的日志告诉我它在崩溃之前甚至没有被调用。


编辑:

这是调用解析器的代码。

public Document fileToDom(File file) throws ProcessException {
    Document doc = null;
    try {
        DocumentBuilderFactory db = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = db.newDocumentBuilder();
        if (this.errorHandler!=null){
            builder.setErrorHandler(this.errorHandler);}
        else {
            builder.setErrorHandler(new DefaultHandler());
        }
        FileInputStream test= new FileInputStream(file);
        doc = builder.parse(test);
        ...
    } catch (Exception e) {...}
    ...
}

目前,我发现自己被迫在解析之前删除 DOCTYPE,这消除了所有问题,以及 DTD 验证......这不是一个很好的解决方案。

I have files on my file system, on Windows XP. I want to parse them using Java (JRE 1.6).

Problem is, I don't understand how Java and Xerces work together when the file path has spaces in it.

If the file has no spaces in its path, all works fine.

If there are spaces, I may have this kind of trouble, even if I call the parser with a FileInputStream instance :

java.net.UnknownHostException: .
    at java.net.PlainSocketImpl.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at sun.net.NetworkClient.doConnect(Unknown Source)
    at sun.net.NetworkClient.openServer(Unknown Source)
    at sun.net.ftp.FtpClient.openServer(Unknown Source)
    at sun.net.ftp.FtpClient.openServer(Unknown Source)
    at sun.net.www.protocol.ftp.FtpURLConnection.connect(Unknown Source)
    at sun.net.www.protocol.ftp.FtpURLConnection.getInputStream(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)

(sun.net.ftp.FtpClient.openServer ??? Wtf ?)

or else this kind of trouble :

java.net.MalformedURLException: unknown protocol: d
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)

(It says unknown protocol: d because, I guess, the file is on the D drive.)

Has anyone any clue of why that happens, and how to circumvent the problem ? I tried to supply my own EntityResolver but my log tells me it is not even called before the crash.


EDIT:

Here is the code calling the parser.

public Document fileToDom(File file) throws ProcessException {
    Document doc = null;
    try {
        DocumentBuilderFactory db = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = db.newDocumentBuilder();
        if (this.errorHandler!=null){
            builder.setErrorHandler(this.errorHandler);}
        else {
            builder.setErrorHandler(new DefaultHandler());
        }
        FileInputStream test= new FileInputStream(file);
        doc = builder.parse(test);
        ...
    } catch (Exception e) {...}
    ...
}

For the moment I find myself forced to remove the DOCTYPE before the parse, which removes all the problems, and the DTD validation... Not so great a solution.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

酒解孤独 2024-08-03 18:33:38

您只是使用 DocumentBuilder.parse(filename) 吗?

如果是这样,那就失败了,因为它需要一个 URI。 打开文件的 FileInputStream,然后将其传递给 DocumentBuilder.parse(InputStream)

Are you just using DocumentBuilder.parse(filename)?

If so, that's failing because it expects a URI. Open a FileInputStream to the file, and then pass that to DocumentBuilder.parse(InputStream).

じ违心 2024-08-03 18:33:38

试试这个 URI 样式:

文件:///d:/folder/folder%20with%20space/file.xml

Try this URI style:

file:///d:/folder/folder%20with%20space/file.xml

断念 2024-08-03 18:33:38

看起来它正在尝试连接到 doctype 标头中的 URL,以便可以下载它,以便根据下载的 DTD 验证文档。

It looks like it's trying to connect to a URL in the doctype header so it can download it in order to validate the document against the downloaded DTD.

动听の歌 2024-08-03 18:33:38

尝试这个。

InputSource is = new InputSource();
is.setCharacterStream(new StringReader(test));
doc = builder.parse(is);

而不是仅仅解析“测试”

Try this.

InputSource is = new InputSource();
is.setCharacterStream(new StringReader(test));
doc = builder.parse(is);

instead of just parsing the 'test'

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文