在 Android API 1.5 上使用 SAX 解析器 (javax.xml.parsers.SAXParser) 解析引号时出现问题

发布于 2024-08-27 06:25:37 字数 1467 浏览 12 评论 0原文

使用SAX解析器时,当节点内容中存在“时,解析失败。如何解决这个问题?是否需要转换所有”字符?

换句话说,每当我在节点中有一个引用时:

 <node>characters in node containing "quotes"</node>

当处理程序解析该节点时,该节点会被分割成多个字符数组。这是正常行为吗?为什么引用会导致这样的问题?

这是我正在使用的代码:

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;

 ...


HttpGet httpget = new HttpGet(GATEWAY_URL + "/"+ question.getId());
          httpget.setHeader("User-Agent", PayloadService.userAgent);
          httpget.setHeader("Content-Type", "application/xml");

          HttpResponse response = PayloadService.getHttpclient().execute(httpget);
          HttpEntity entity = response.getEntity();

          if(entity != null)
          {        
              SAXParserFactory spf = SAXParserFactory.newInstance();
              SAXParser sp = spf.newSAXParser();            
              XMLReader xr = sp.getXMLReader();            

              ConvoHandler convoHandler = new ConvoHandler();
              xr.setContentHandler(convoHandler);             
              xr.parse(new InputSource(entity.getContent()));                                


              entity.consumeContent();         

               messageList = convoHandler.getMessageList();


          }

When using a SAX parser, parsing fails when there is a " in the node content. How can I resolve this? Do I need to convert all " characters?

In other words, anytime I have a quote in a node:

 <node>characters in node containing "quotes"</node>

That node gets butchered into multiple character arrays when the Handler is parsing it. Is this normal behaviour? Why should quotes cause such a problem?

Here is the code I am using:

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;

 ...


HttpGet httpget = new HttpGet(GATEWAY_URL + "/"+ question.getId());
          httpget.setHeader("User-Agent", PayloadService.userAgent);
          httpget.setHeader("Content-Type", "application/xml");

          HttpResponse response = PayloadService.getHttpclient().execute(httpget);
          HttpEntity entity = response.getEntity();

          if(entity != null)
          {        
              SAXParserFactory spf = SAXParserFactory.newInstance();
              SAXParser sp = spf.newSAXParser();            
              XMLReader xr = sp.getXMLReader();            

              ConvoHandler convoHandler = new ConvoHandler();
              xr.setContentHandler(convoHandler);             
              xr.parse(new InputSource(entity.getContent()));                                


              entity.consumeContent();         

               messageList = convoHandler.getMessageList();


          }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

浅沫记忆 2024-09-03 06:25:37

该错误位于您最近的评论中引用的处理程序类中。

编写 ContentHandler 时的一个常见错误是假设字符方法只会对所有字符数据调用一次。事实上,它可以使用您必须收集的字符数据块多次调用。分成多个字符数组是正常行为。

可能您需要在 startElement 方法中启动一个收集器(可能是 StringBuffer),在您的字符方法中将数据收集到其中,然后在您的 endElement 方法中使用数据,这应该是调用注释中显示的 message.setText 的地方。

The error is in your handler class referenced in your most recent comment.

A common error in writing a ContentHandler is to assume the characters method is only going to be called once with all the character data. It can in fact be called multiple times with chunks of the character data, which you have to collect. The chopping up into multiple character arrays is normal behavior.

Probably you need to initiate a collector (maybe a StringBuffer) in your startElement method, collect data into it in your characters method and then use the data in your endElement method, which should be where the message.setText shown in your comment is called.

若水般的淡然安静女子 2024-09-03 06:25:37

已经给出了正确的答案(不保证字符数据作为单个事件发送)。需要考虑的一件事是,也许使用具有 Stax(或 xmlpull)“拉”接口的解析器会更好;有一种方法可以强制 Stax 解析器确保所有 char 数据都报告为单个标记(启用合并)。 Stax(或一般的拉解析器)被认为比 SAX 使用起来更方便,并且也有在 Android 上运行的实现(我认为 android SDK 甚至捆绑了 xmlpull);伍德斯托克斯和阿尔托应该可以。

Correct answer has already been given (wrt no guarantees in character data being sent as single event). One thing to consider is that perhaps using a parser with Stax (or xmlpull) "pull" interface would work better; there is a way to force Stax parser to ensure all char data is reported as single token (enable coalescing). Stax (or pull parsers in general) are considered bit more convenient use than SAX, and there are implementations that run on Android as well (android SDK even bundles xmlpull I think); Woodstox and Aalto should work.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文