使用 XmlSlurper 解析 UTF-8 xml 文件

发布于 2024-12-10 10:29:10 字数 2506 浏览 3 评论 0原文

我正在尝试使用 XmlSlurper 解析 googleatom。我的用例是这样的。

1）使用rest客户端将atom xml发送到服务器。

2）在服务器端处理请求并解析它。

我使用 Groovy 开发服务器并使用 XmlSlurper 作为解析器。但我无法成功并获得“序言中不允许内容”异常。然后我试图找出它发生的原因。我将atom xml保存到一个用utf-8编码的文件中。然后尝试读取文件并解析原子，我得到相同的异常。但后来我将atom xml保存到一个用ansi编码的文件中。我成功解析了atom xml。所以我认为问题出在 XmlSlurper 和“UTF-8”上。

您对这个限制有什么想法吗？我的atom xml必须是utf-8，那么我如何解析这个atom xml？感谢您的帮助。

XML：

<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns:atom='http://www.w3.org/2005/Atom'
    xmlns:gd='http://schemas.google.com/g/2005'>
  <category scheme='http://schemas.google.com/g/2005#kind'
    term='http://schemas.google.com/contact/2008#contact' />
  <title type='text'>Elizabeth Bennet</title>
  <content type='text'>Notes</content>
  <gd:email rel='http://schemas.google.com/g/2005#work'
    address='[email protected]' />
  <gd:email rel='http://schemas.google.com/g/2005#home'
    address='[email protected]' />
  <gd:phoneNumber rel='http://schemas.google.com/g/2005#work'
    primary='true'>
    (206)555-1212
  </gd:phoneNumber>
  <gd:phoneNumber rel='http://schemas.google.com/g/2005#home'>
    (206)555-1213
  </gd:phoneNumber>
  <gd:im address='[email protected]'
    protocol='http://schemas.google.com/g/2005#GOOGLE_TALK'
    rel='http://schemas.google.com/g/2005#home' />
  <gd:postalAddress rel='http://schemas.google.com/g/2005#work'
    primary='true'>
    1600 Amphitheatre Pkwy Mountain View
  </gd:postalAddress>
</entry>

读取文件并解析：

 String file = "C:\\Documents and Settings\\user\\Desktop\\create.xml";
 String line = "";
 StringBuilder sb = new StringBuilder();
 BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
 while ((line = br.readLine()) !=null) {
     sb.append(line);
 }
 System.out.println("sb.toString() = " + sb.toString());

 def xmlf = new XmlSlurper().parseText(sb.toString())
    .declareNamespace(gContact:'http://schemas.google.com/contact/2008',
        gd:'http://schemas.google.com/g/2005')

   println xmlf.title

原文

I'm trying to parse google atom with XmlSlurper. My use case is something like this.

1) Send an atom xml to server with rest client.

2)Handle request and parse it on server side.

I develop my server with Groovy and used XmlSlurper as a parser. But i couldnt succed and get the "content is not allowed in prolog" exception. And then i tried to find the reason why it happened. I saved my atom xml to a file which is encoded with utf-8. And then tried read file and parse atom, i get the same exception. But then i saved atom xml to a file whixh is encoded with ansi. And I parsed atom xml successfully. So i think the problem is about XmlSlurper and "UTF-8".

Do you have any idea about this limitation? My atom xml has to be utf-8, so how can i parse this atom xml ? Thanks for your help.

XML :

<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns:atom='http://www.w3.org/2005/Atom'
    xmlns:gd='http://schemas.google.com/g/2005'>
  <category scheme='http://schemas.google.com/g/2005#kind'
    term='http://schemas.google.com/contact/2008#contact' />
  <title type='text'>Elizabeth Bennet</title>
  <content type='text'>Notes</content>
  <gd:email rel='http://schemas.google.com/g/2005#work'
    address='[email protected]' />
  <gd:email rel='http://schemas.google.com/g/2005#home'
    address='[email protected]' />
  <gd:phoneNumber rel='http://schemas.google.com/g/2005#work'
    primary='true'>
    (206)555-1212
  </gd:phoneNumber>
  <gd:phoneNumber rel='http://schemas.google.com/g/2005#home'>
    (206)555-1213
  </gd:phoneNumber>
  <gd:im address='[email protected]'
    protocol='http://schemas.google.com/g/2005#GOOGLE_TALK'
    rel='http://schemas.google.com/g/2005#home' />
  <gd:postalAddress rel='http://schemas.google.com/g/2005#work'
    primary='true'>
    1600 Amphitheatre Pkwy Mountain View
  </gd:postalAddress>
</entry>

read file and parse :

 String file = "C:\\Documents and Settings\\user\\Desktop\\create.xml";
 String line = "";
 StringBuilder sb = new StringBuilder();
 BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
 while ((line = br.readLine()) !=null) {
     sb.append(line);
 }
 System.out.println("sb.toString() = " + sb.toString());

 def xmlf = new XmlSlurper().parseText(sb.toString())
    .declareNamespace(gContact:'http://schemas.google.com/contact/2008',
        gd:'http://schemas.google.com/g/2005')

   println xmlf.title

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

苏别ゝ 2024-12-17 10:29:10

尝试：

String file = "C:\\Documents and Settings\\user\\Desktop\\create.xml"

def xmlf = new XmlSlurper().parse( new File( file ) ).declareNamespace( 
        gContact:'http://schemas.google.com/contact/2008',
        gd:'http://schemas.google.com/g/2005' )
println xmlf.title

你会走很长的路

Try:

String file = "C:\\Documents and Settings\\user\\Desktop\\create.xml"

def xmlf = new XmlSlurper().parse( new File( file ) ).declareNamespace( 
        gContact:'http://schemas.google.com/contact/2008',
        gd:'http://schemas.google.com/g/2005' )
println xmlf.title

You're going the long way round

回复收藏 0 原文

滴情不沾 2024-12-17 10:29:10

这就是问题所在：

BufferedReader br = new BufferedReader(
    new InputStreamReader(new FileInputStream(file)));
while ((line = br.readLine()) !=null) {
    sb.append(line);
}

正在使用平台默认编码读取文件。如果编码错误，您将错误地读取数据。

您应该做的是让 XML 解析器为您处理它。它应该能够根据第一行数据检测编码本身。

我不熟悉 XmlSlurper，但我希望它能够解析输入流（在这种情况下，只需为其提供 FileInputStream） code>) 或处理文件本身的名称。

This is the problem:

BufferedReader br = new BufferedReader(
    new InputStreamReader(new FileInputStream(file)));
while ((line = br.readLine()) !=null) {
    sb.append(line);
}

That's reading the file with the platform default encoding. If the encoding is wrong, you'll be reading the data incorrectly.

What you should do is let the XML parser handle it for you. It should be able to detect the encoding itself, based on the first line of data.

I'm not familiar with XmlSlurper but I'd expect it to either be able to parse an input stream (in which case just give it the FileInputStream) or handle the name of the file itself.

回复收藏 0 原文

~没有更多了~