使用 XmlSlurper 解析 UTF-8 xml 文件
我正在尝试使用 XmlSlurper 解析 googleatom。我的用例是这样的。
1)使用rest客户端将atom xml发送到服务器。
2)在服务器端处理请求并解析它。
我使用 Groovy 开发服务器并使用 XmlSlurper 作为解析器。但我无法成功并获得“序言中不允许内容”异常。然后我试图找出它发生的原因。我将atom xml保存到一个用utf-8编码的文件中。然后尝试读取文件并解析原子,我得到相同的异常。但后来我将atom xml保存到一个用ansi编码的文件中。我成功解析了atom xml。所以我认为问题出在 XmlSlurper 和“UTF-8”上。
您对这个限制有什么想法吗?我的atom xml必须是utf-8,那么我如何解析这个atom xml?感谢您的帮助。
XML:
<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns:atom='http://www.w3.org/2005/Atom'
xmlns:gd='http://schemas.google.com/g/2005'>
<category scheme='http://schemas.google.com/g/2005#kind'
term='http://schemas.google.com/contact/2008#contact' />
<title type='text'>Elizabeth Bennet</title>
<content type='text'>Notes</content>
<gd:email rel='http://schemas.google.com/g/2005#work'
address='[email protected]' />
<gd:email rel='http://schemas.google.com/g/2005#home'
address='[email protected]' />
<gd:phoneNumber rel='http://schemas.google.com/g/2005#work'
primary='true'>
(206)555-1212
</gd:phoneNumber>
<gd:phoneNumber rel='http://schemas.google.com/g/2005#home'>
(206)555-1213
</gd:phoneNumber>
<gd:im address='[email protected]'
protocol='http://schemas.google.com/g/2005#GOOGLE_TALK'
rel='http://schemas.google.com/g/2005#home' />
<gd:postalAddress rel='http://schemas.google.com/g/2005#work'
primary='true'>
1600 Amphitheatre Pkwy Mountain View
</gd:postalAddress>
</entry>
读取文件并解析:
String file = "C:\\Documents and Settings\\user\\Desktop\\create.xml";
String line = "";
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
while ((line = br.readLine()) !=null) {
sb.append(line);
}
System.out.println("sb.toString() = " + sb.toString());
def xmlf = new XmlSlurper().parseText(sb.toString())
.declareNamespace(gContact:'http://schemas.google.com/contact/2008',
gd:'http://schemas.google.com/g/2005')
println xmlf.title
I'm trying to parse google atom with XmlSlurper. My use case is something like this.
1) Send an atom xml to server with rest client.
2)Handle request and parse it on server side.
I develop my server with Groovy and used XmlSlurper as a parser. But i couldnt succed and get the "content is not allowed in prolog" exception. And then i tried to find the reason why it happened. I saved my atom xml to a file which is encoded with utf-8. And then tried read file and parse atom, i get the same exception. But then i saved atom xml to a file whixh is encoded with ansi. And I parsed atom xml successfully. So i think the problem is about XmlSlurper and "UTF-8".
Do you have any idea about this limitation? My atom xml has to be utf-8, so how can i parse this atom xml ? Thanks for your help.
XML :
<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns:atom='http://www.w3.org/2005/Atom'
xmlns:gd='http://schemas.google.com/g/2005'>
<category scheme='http://schemas.google.com/g/2005#kind'
term='http://schemas.google.com/contact/2008#contact' />
<title type='text'>Elizabeth Bennet</title>
<content type='text'>Notes</content>
<gd:email rel='http://schemas.google.com/g/2005#work'
address='[email protected]' />
<gd:email rel='http://schemas.google.com/g/2005#home'
address='[email protected]' />
<gd:phoneNumber rel='http://schemas.google.com/g/2005#work'
primary='true'>
(206)555-1212
</gd:phoneNumber>
<gd:phoneNumber rel='http://schemas.google.com/g/2005#home'>
(206)555-1213
</gd:phoneNumber>
<gd:im address='[email protected]'
protocol='http://schemas.google.com/g/2005#GOOGLE_TALK'
rel='http://schemas.google.com/g/2005#home' />
<gd:postalAddress rel='http://schemas.google.com/g/2005#work'
primary='true'>
1600 Amphitheatre Pkwy Mountain View
</gd:postalAddress>
</entry>
read file and parse :
String file = "C:\\Documents and Settings\\user\\Desktop\\create.xml";
String line = "";
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
while ((line = br.readLine()) !=null) {
sb.append(line);
}
System.out.println("sb.toString() = " + sb.toString());
def xmlf = new XmlSlurper().parseText(sb.toString())
.declareNamespace(gContact:'http://schemas.google.com/contact/2008',
gd:'http://schemas.google.com/g/2005')
println xmlf.title
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试:
你会走很长的路
Try:
You're going the long way round
这就是问题所在:
正在使用平台默认编码读取文件。如果编码错误,您将错误地读取数据。
您应该做的是让 XML 解析器为您处理它。它应该能够根据第一行数据检测编码本身。
我不熟悉 XmlSlurper,但我希望它能够解析输入流(在这种情况下,只需为其提供 FileInputStream) code>) 或 处理文件本身的名称。
This is the problem:
That's reading the file with the platform default encoding. If the encoding is wrong, you'll be reading the data incorrectly.
What you should do is let the XML parser handle it for you. It should be able to detect the encoding itself, based on the first line of data.
I'm not familiar with
XmlSlurper
but I'd expect it to either be able to parse an input stream (in which case just give it theFileInputStream
) or handle the name of the file itself.