从套接字读取大块 xml 数据并即时解析
我正在开发一个 android 客户端,它通过 TCP 套接字从我的 java 服务器读取连续的 xml 数据流。服务器发送“\n”字符作为连续响应之间的分隔符。下面给出的是一个模型实现。
<response1>
<datas>
<data>
.....
.....
</data>
<data>
.....
.....
</data>
........
........
</datas>
</response1>\n <--- \n acts as delimiter ---/>
<response2>
<datas>
<data>
.....
.....
</data>
<data>
.....
.....
</data>
........
........
</datas>
</response2>\n
我希望现在结构已经清楚了。该响应是从服务器 zlib 压缩传输的。因此,我必须首先膨胀从服务器读取的任何内容,使用分隔符和解析来分隔响应。 我正在使用 SAX 来解析我的 XML
现在我的主要问题是来自服务器的 xml 响应可能非常大(可能在 3 到 4 MB 的范围内)。因此,
- 要根据分隔符 (\n) 分隔响应,我必须使用 stringBuilder 在从套接字读取时存储响应块 在某些手机上 StringBuilder 无法将字符串存储在 兆字节范围。它给出了 OutOfMemory 异常,并且来自 像 this 这样的线程我知道保持大字符串(即使在 临时基础)不是一个好主意。
接下来我尝试传递 inflatorReadStream (它又获取数据 来自套接字输入流)作为SAX解析器的输入流(不带 自己费心去分离xml,依靠SAX的能力来查找 基于标签的文档结尾)。这次得到一个回应 解析成功,但随后找到 '\n' 分隔符 SAX 抛出 ExpatParserParseException 并在文档后说 垃圾 element .
- 捕获 ExpatParserParseException 后,我尝试阅读 再次,但抛出异常后 SAX 解析器关闭流,所以 当我尝试再次读取/解析时,它给出 IOException 说 输入流已关闭。
下面给出了我所做的代码片段(为了清楚起见,删除了所有不相关的 try catch 块)。
private Socket clientSocket = null;
DataInputStream readStream = null;
DataOutputStream writeStream = null;
private StringBuilder incompleteResponse = null;
private AppContext context = null;
public boolean connectToHost(String ipAddress, int port,AppContext myContext){
context = myContext;
website = site;
InetAddress serverAddr = null;
serverAddr = InetAddress.getByName(website.mIpAddress);
clientSocket = new Socket(serverAddr, port);
//If connected create a read and write Stream objects..
readStream = new DataInputStream(new InflaterInputStream(clientSocket.getInputStream()));
writeStream = new DataOutputStream(clientSocket.getOutputStream());
Thread readThread = new Thread(){
@Override
public void run(){
ReadFromSocket();
}
};
readThread.start();
return true;
}
public void ReadFromSocket(){
while(true){
InputSource xmlInputSource = new InputSource(readStream);
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = null;
XMLReader xr = null;
try{
sp = spf.newSAXParser();
xr = sp.getXMLReader();
ParseHandler xmlHandler = new ParseHandler(context.getSiteListArray().indexOf(website), context);
xr.setContentHandler(xmlHandler);
xr.parse(xmlInputSource);
// postSuccessfullParsingNotification();
}catch(SAXException e){
e.printStackTrace();
postSuccessfullParsingNotification();
}catch(ParserConfigurationException e){
e.printStackTrace();
postSocketDisconnectionBroadcast();
break;
}catch (IOException e){
postSocketDisconnectionBroadcast();
e.printStackTrace();
e.toString();
break;
}catch (Exception e){
postSocketDisconnectionBroadcast();
e.printStackTrace();
break;
}
}
}
现在我的问题是
- 有没有办法让 SAX 解析器在 on 之后忽略垃圾字符 xml 响应,而不是抛出异常并关闭流。
- 如果没有,有什么方法可以避免内存不足错误 字符串生成器。坦白说,我不排除对这个问题给予肯定的回答 这。有什么解决办法吗?
I am working on an android client which reads continues stream of xml data from my java server via a TCP socket. The server sends a '\n' character as delimiter between consecutive responses. Below given is a model implementation..
<response1>
<datas>
<data>
.....
.....
</data>
<data>
.....
.....
</data>
........
........
</datas>
</response1>\n <--- \n acts as delimiter ---/>
<response2>
<datas>
<data>
.....
.....
</data>
<data>
.....
.....
</data>
........
........
</datas>
</response2>\n
Well I hope the structure is clear now. This response is transmitted from server zlib compressed. So I have to first inflate whatever I am reading from the server, separate on response using delimiter and parse. And I am using SAX to parse my XML
Now my main problem is the xml response coming from server can be very large (can be in the range of 3 to 4 MB). So
to separate responses based on delimiter (\n) I have to use a
stringBuilder to store response blocks as it reads from socket
and on some phones StringBuilder cannot store strings in the
MegaBytes range. It is giving OutOfMemory exception, and from
threads like this I got to know keeping large strings (even on a
temporary basis) is not such a good idea.Next I tried to pass the inflatorReadStream (which in turn takes data
from socket input stream) as the input stream of SAX parser (without
bothering to separate xml myself and relying on SAX's ability to find
the end of document based on tags). This time one response gets
parsed successfully, but then on finding the '\n' delimiter SAX
throws ExpatParserParseException saying junk after document
element .- After catching that ExpatParserParseException I tried to read
again, but after throwing exception SAX Parser closes the stream, so
when I try to read/parse again, it is giving IOException saying
input stream is closed.
A code snippet of what I have done is given below (removed all unrelated try catch blocks for clarity).
private Socket clientSocket = null;
DataInputStream readStream = null;
DataOutputStream writeStream = null;
private StringBuilder incompleteResponse = null;
private AppContext context = null;
public boolean connectToHost(String ipAddress, int port,AppContext myContext){
context = myContext;
website = site;
InetAddress serverAddr = null;
serverAddr = InetAddress.getByName(website.mIpAddress);
clientSocket = new Socket(serverAddr, port);
//If connected create a read and write Stream objects..
readStream = new DataInputStream(new InflaterInputStream(clientSocket.getInputStream()));
writeStream = new DataOutputStream(clientSocket.getOutputStream());
Thread readThread = new Thread(){
@Override
public void run(){
ReadFromSocket();
}
};
readThread.start();
return true;
}
public void ReadFromSocket(){
while(true){
InputSource xmlInputSource = new InputSource(readStream);
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = null;
XMLReader xr = null;
try{
sp = spf.newSAXParser();
xr = sp.getXMLReader();
ParseHandler xmlHandler = new ParseHandler(context.getSiteListArray().indexOf(website), context);
xr.setContentHandler(xmlHandler);
xr.parse(xmlInputSource);
// postSuccessfullParsingNotification();
}catch(SAXException e){
e.printStackTrace();
postSuccessfullParsingNotification();
}catch(ParserConfigurationException e){
e.printStackTrace();
postSocketDisconnectionBroadcast();
break;
}catch (IOException e){
postSocketDisconnectionBroadcast();
e.printStackTrace();
e.toString();
break;
}catch (Exception e){
postSocketDisconnectionBroadcast();
e.printStackTrace();
break;
}
}
}
And now my questions are
- Is there any way to make SAX Parser ignore junk characters after on
xml response, and not throw exception and close the stream.. - If not is there any way to avoid out of memory error on
stringBuilder. To be frank,I am not excepting a positive answer on
this. Any workaround?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您的 SAX 解析器支持推送模型(您自己将原始数据块推送到其中,并且它在解析原始数据时触发事件),那么您可以简单地在 SAX 会话开始时推送您自己的初始 XML 标记。这将成为顶级文档标签,然后您可以在收到响应时推送响应,就 SAX 而言,它们将是二级标签。这样,您可以在同一个 SAX 会话中推送多个响应,然后在 OnTagOpen 事件(或您正在使用的任何事件)中,当您在级别 1 检测到新响应的标签名称时,您就会知道新响应何时开始。
If your SAX parser supports a push model (where you push raw data chunks into it yourself and it fires events as it parses the raw data), then you can simply push your own initial XML tag at the beginning of the SAX session. That will become the top-level document tag, then you can push the responses as you receive them and they will be second-level tags as far as SAX is concerned. That way, you can push multiple responses in the same SAX session, and then in the OnTagOpen event (or wheatever you are using), you will know when a new response begins when you detect its tag name at level 1.