从输入流中过滤 \n 字符
我尝试使用 sax 解析器从输入流解析 xml。输入流从套接字连续获取传入的 xml。 '\n' 用作 xml 数据之间的分隔符。这就是 xml 的样子
<?xml version="1.0" encoding="UTF-8"?>
<response processor="header" callback="comheader">
<properties>
<timezone>Asia%2FBeirut</timezone>
<rawoffset>7200000</rawoffset>
<to_date>1319256000000</to_date>
<dstrawoffset>10800000</dstrawoffset>
</properties>
</response>
\n
<event type="progress" time="1317788744214">
<param key="callback">todayactions</param>
<param key="percent">10</param>
<param key="msg">MAPPING</param>
</event>
<event type="progress" time="1317788744216">
<param key="callback">todayactions</param>
<param key="percent">20</param><param key="msg">MAPPING</param>
</event>
\n
<?xml version="1.0" encoding="UTF-8"?>
<response processor="header" callback="comheader">
<properties>
<timezone>Asia%2FBeirut</timezone>
<rawoffset>7200000</rawoffset>
<to_date>1319256000000</to_date>
<dstrawoffset>10800000</dstrawoffset>
</properties>
</response>
这对于我们的 iphone 项目来说非常有效,因为我们将字符存储在字符串中并使用 dom 解析器。
但是当我尝试为 Android 执行此操作时,字符串不是一个选项,因为它给了我们 OutOfMemory 异常。因此,我们将输入流直接设置为 SaxParser,它会一直工作直到 \n 字符,之后它会给我们异常
org.apache.harmony.xml.ExpatParser$ParseException:位于第 2 行,列 0:文档元素后出现垃圾
因此我尝试过滤输入流以跳过“\n”字符。我创建了一个 FilterStreamReader 但没有成功,看来我的读取功能没有完成这项工作。这是我的代码。
public class FilterStreamReader extends InputStreamReader {
public FilterStreamReader(InputStream in, String enc)
throws UnsupportedEncodingException {
super(in, enc);
}
@Override
public int read(char[] cbuf, int off, int len) throws IOException {
int read = super.read(cbuf, off, len);
Log.e("Reader",Character.toString((char)read));
if (read == -1) {
return -1;
}
int pos = off - 1;
for (int readPos = off; readPos < off + read; readPos++) {
if (read == '\n') {
pos++;
} else {
continue;
}
if (pos < readPos) {
cbuf[pos] = cbuf[readPos];
}
}
return pos - off + 1;
}
有人可以帮我过滤输入流的 \n 吗?
编辑 根据格雷厄姆所说,我能够通过删除所有文档类型并添加我自己的开始和结束标签来解析整个数据。所以我不太确定我的问题不是单独过滤 '\n' 。如何解析不断出现的 xml?
I trying to parse xml from an inputstream using the sax parser. The inputstream get incoming xml continously from a socket. '\n' is used as a delimiter between xml data. This is how the xml would look like
<?xml version="1.0" encoding="UTF-8"?>
<response processor="header" callback="comheader">
<properties>
<timezone>Asia%2FBeirut</timezone>
<rawoffset>7200000</rawoffset>
<to_date>1319256000000</to_date>
<dstrawoffset>10800000</dstrawoffset>
</properties>
</response>
\n
<event type="progress" time="1317788744214">
<param key="callback">todayactions</param>
<param key="percent">10</param>
<param key="msg">MAPPING</param>
</event>
<event type="progress" time="1317788744216">
<param key="callback">todayactions</param>
<param key="percent">20</param><param key="msg">MAPPING</param>
</event>
\n
<?xml version="1.0" encoding="UTF-8"?>
<response processor="header" callback="comheader">
<properties>
<timezone>Asia%2FBeirut</timezone>
<rawoffset>7200000</rawoffset>
<to_date>1319256000000</to_date>
<dstrawoffset>10800000</dstrawoffset>
</properties>
</response>
This worked perfectly for the our iphone project as we took the characters upto \n and stored that in a string and used the dom parser.
But when I tried to do this for the android, string was not an option as it gave us OutOfMemory exception. So we set the inputstream directly to the SaxParser it works until the \n character, after that it gives us the exception
org.apache.harmony.xml.ExpatParser$ParseException: At line 2, column
0: junk after document element
So I tried to filter the inputstream to skip the '\n' character. I created a FilterStreamReader but I was not successful, it seems my read function isn't doing the job. Here is my code.
public class FilterStreamReader extends InputStreamReader {
public FilterStreamReader(InputStream in, String enc)
throws UnsupportedEncodingException {
super(in, enc);
}
@Override
public int read(char[] cbuf, int off, int len) throws IOException {
int read = super.read(cbuf, off, len);
Log.e("Reader",Character.toString((char)read));
if (read == -1) {
return -1;
}
int pos = off - 1;
for (int readPos = off; readPos < off + read; readPos++) {
if (read == '\n') {
pos++;
} else {
continue;
}
if (pos < readPos) {
cbuf[pos] = cbuf[readPos];
}
}
return pos - off + 1;
}
Can someone help me filter the \n of an inputstream?
Edit
Based on what graham said I was able to parse the whole data by removing all the doc types and adding my own start and end tag. So Im not really sure that my problem is not filtering '\n' alone. How can you parse xml that keeps coming like this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题不在于
\n
。就是在第一个标签之后,它认为文档是完整的。
此数据不是有效的 XML。您应该将所有内容包装在单个顶级节点内。另外,文档中不能有第二个
声明。
The problem isn't the
\n
. It's that after the first</response>
tag, it thinks the document is complete.This data isn't valid XML. You should wrap everything inside a single top-level node. Also, you can't have a second
<?xml version="1.0" encoding="UTF-8"?>
declaration part-way through the document.