使用Java解析XML文件和文件路径中的空格
我的文件系统(Windows XP 上)上有文件。 我想使用 Java (JRE 1.6) 解析它们。
问题是,我不明白当文件路径中有空格时 Java 和 Xerces 如何协同工作。
如果文件路径中没有空格,则一切正常。
如果有空格,即使我使用 FileInputStream 实例调用解析器,我也可能会遇到这种麻烦:(
java.net.UnknownHostException: .
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at sun.net.NetworkClient.doConnect(Unknown Source)
at sun.net.NetworkClient.openServer(Unknown Source)
at sun.net.ftp.FtpClient.openServer(Unknown Source)
at sun.net.ftp.FtpClient.openServer(Unknown Source)
at sun.net.www.protocol.ftp.FtpURLConnection.connect(Unknown Source)
at sun.net.www.protocol.ftp.FtpURLConnection.getInputStream(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
sun.net.ftp.FtpClient.openServer
?? ?Wtf?)
否则会遇到这种麻烦:(
java.net.MalformedURLException: unknown protocol: d
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
它说未知协议:d
,因为我猜该文件位于D驱动器上。)
有谁知道为什么会发生这种情况以及如何发生吗 ?规避问题? 我尝试提供自己的 EntityResolver 但我的日志告诉我它在崩溃之前甚至没有被调用。
编辑:
这是调用解析器的代码。
public Document fileToDom(File file) throws ProcessException {
Document doc = null;
try {
DocumentBuilderFactory db = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = db.newDocumentBuilder();
if (this.errorHandler!=null){
builder.setErrorHandler(this.errorHandler);}
else {
builder.setErrorHandler(new DefaultHandler());
}
FileInputStream test= new FileInputStream(file);
doc = builder.parse(test);
...
} catch (Exception e) {...}
...
}
目前,我发现自己被迫在解析之前删除 DOCTYPE,这消除了所有问题,以及 DTD 验证......这不是一个很好的解决方案。
I have files on my file system, on Windows XP. I want to parse them using Java (JRE 1.6).
Problem is, I don't understand how Java and Xerces work together when the file path has spaces in it.
If the file has no spaces in its path, all works fine.
If there are spaces, I may have this kind of trouble, even if I call the parser with a FileInputStream instance :
java.net.UnknownHostException: .
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at sun.net.NetworkClient.doConnect(Unknown Source)
at sun.net.NetworkClient.openServer(Unknown Source)
at sun.net.ftp.FtpClient.openServer(Unknown Source)
at sun.net.ftp.FtpClient.openServer(Unknown Source)
at sun.net.www.protocol.ftp.FtpURLConnection.connect(Unknown Source)
at sun.net.www.protocol.ftp.FtpURLConnection.getInputStream(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
(sun.net.ftp.FtpClient.openServer
??? Wtf ?)
or else this kind of trouble :
java.net.MalformedURLException: unknown protocol: d
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
(It says unknown protocol: d
because, I guess, the file is on the D drive.)
Has anyone any clue of why that happens, and how to circumvent the problem ? I tried to supply my own EntityResolver but my log tells me it is not even called before the crash.
EDIT:
Here is the code calling the parser.
public Document fileToDom(File file) throws ProcessException {
Document doc = null;
try {
DocumentBuilderFactory db = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = db.newDocumentBuilder();
if (this.errorHandler!=null){
builder.setErrorHandler(this.errorHandler);}
else {
builder.setErrorHandler(new DefaultHandler());
}
FileInputStream test= new FileInputStream(file);
doc = builder.parse(test);
...
} catch (Exception e) {...}
...
}
For the moment I find myself forced to remove the DOCTYPE before the parse, which removes all the problems, and the DTD validation... Not so great a solution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您只是使用
DocumentBuilder.parse(filename)
吗?如果是这样,那就失败了,因为它需要一个 URI。 打开文件的
FileInputStream
,然后将其传递给DocumentBuilder.parse(InputStream)
。Are you just using
DocumentBuilder.parse(filename)
?If so, that's failing because it expects a URI. Open a
FileInputStream
to the file, and then pass that toDocumentBuilder.parse(InputStream)
.试试这个 URI 样式:
Try this URI style:
看起来它正在尝试连接到 doctype 标头中的 URL,以便可以下载它,以便根据下载的 DTD 验证文档。
It looks like it's trying to connect to a URL in the doctype header so it can download it in order to validate the document against the downloaded DTD.
尝试这个。
而不是仅仅解析“测试”
Try this.
instead of just parsing the 'test'