从输入流中过滤 \n 字符

发布于 2024-12-08 18:58:46 字数 2702 浏览 3 评论 0原文

我尝试使用 sax 解析器从输入流解析 xml。输入流从套接字连续获取传入的 xml。 '\n' 用作 xml 数据之间的分隔符。这就是 xml 的样子

<?xml version="1.0" encoding="UTF-8"?>
<response processor="header" callback="comheader">
    <properties>
        <timezone>Asia%2FBeirut</timezone>
        <rawoffset>7200000</rawoffset>
        <to_date>1319256000000</to_date>
        <dstrawoffset>10800000</dstrawoffset>
    </properties>
</response>
\n
<event type="progress" time="1317788744214">
    <param key="callback">todayactions</param>
    <param key="percent">10</param>
    <param key="msg">MAPPING</param>
</event>
<event type="progress" time="1317788744216">
    <param key="callback">todayactions</param>
    <param key="percent">20</param><param key="msg">MAPPING</param>
</event>
\n
<?xml version="1.0" encoding="UTF-8"?>
<response processor="header" callback="comheader">
    <properties>
        <timezone>Asia%2FBeirut</timezone>
        <rawoffset>7200000</rawoffset>
        <to_date>1319256000000</to_date>
        <dstrawoffset>10800000</dstrawoffset>
    </properties>
</response>

这对于我们的 iphone 项目来说非常有效，因为我们将字符存储在字符串中并使用 dom 解析器。

但是当我尝试为 Android 执行此操作时，字符串不是一个选项，因为它给了我们 OutOfMemory 异常。因此，我们将输入流直接设置为 SaxParser，它会一直工作直到 \n 字符，之后它会给我们异常

org.apache.harmony.xml.ExpatParser$ParseException：位于第 2 行，列 0：文档元素后出现垃圾

因此我尝试过滤输入流以跳过“\n”字符。我创建了一个 FilterStreamReader 但没有成功，看来我的读取功能没有完成这项工作。这是我的代码。

public class FilterStreamReader extends InputStreamReader {
    public FilterStreamReader(InputStream in, String enc)
            throws UnsupportedEncodingException {
        super(in, enc);
    }

    @Override
    public int read(char[] cbuf, int off, int len) throws IOException {
        int read = super.read(cbuf, off, len);
        Log.e("Reader",Character.toString((char)read));
        if (read == -1) {
            return -1;
        }

        int pos = off - 1;
        for (int readPos = off; readPos < off + read; readPos++) {
            if (read == '\n') {
                pos++;
            } else {                
                continue;
            }
            if (pos < readPos) {
                cbuf[pos] = cbuf[readPos];
            }
        }
        return pos - off + 1;
}

有人可以帮我过滤输入流的 \n 吗？

编辑根据格雷厄姆所说，我能够通过删除所有文档类型并添加我自己的开始和结束标签来解析整个数据。所以我不太确定我的问题不是单独过滤 '\n' 。如何解析不断出现的 xml？

原文

I trying to parse xml from an inputstream using the sax parser. The inputstream get incoming xml continously from a socket. '\n' is used as a delimiter between xml data. This is how the xml would look like

<?xml version="1.0" encoding="UTF-8"?>
<response processor="header" callback="comheader">
    <properties>
        <timezone>Asia%2FBeirut</timezone>
        <rawoffset>7200000</rawoffset>
        <to_date>1319256000000</to_date>
        <dstrawoffset>10800000</dstrawoffset>
    </properties>
</response>
\n
<event type="progress" time="1317788744214">
    <param key="callback">todayactions</param>
    <param key="percent">10</param>
    <param key="msg">MAPPING</param>
</event>
<event type="progress" time="1317788744216">
    <param key="callback">todayactions</param>
    <param key="percent">20</param><param key="msg">MAPPING</param>
</event>
\n
<?xml version="1.0" encoding="UTF-8"?>
<response processor="header" callback="comheader">
    <properties>
        <timezone>Asia%2FBeirut</timezone>
        <rawoffset>7200000</rawoffset>
        <to_date>1319256000000</to_date>
        <dstrawoffset>10800000</dstrawoffset>
    </properties>
</response>

This worked perfectly for the our iphone project as we took the characters upto \n and stored that in a string and used the dom parser.

But when I tried to do this for the android, string was not an option as it gave us OutOfMemory exception. So we set the inputstream directly to the SaxParser it works until the \n character, after that it gives us the exception

org.apache.harmony.xml.ExpatParser$ParseException: At line 2, column
0: junk after document element

So I tried to filter the inputstream to skip the '\n' character. I created a FilterStreamReader but I was not successful, it seems my read function isn't doing the job. Here is my code.

public class FilterStreamReader extends InputStreamReader {
    public FilterStreamReader(InputStream in, String enc)
            throws UnsupportedEncodingException {
        super(in, enc);
    }

    @Override
    public int read(char[] cbuf, int off, int len) throws IOException {
        int read = super.read(cbuf, off, len);
        Log.e("Reader",Character.toString((char)read));
        if (read == -1) {
            return -1;
        }

        int pos = off - 1;
        for (int readPos = off; readPos < off + read; readPos++) {
            if (read == '\n') {
                pos++;
            } else {                
                continue;
            }
            if (pos < readPos) {
                cbuf[pos] = cbuf[readPos];
            }
        }
        return pos - off + 1;
}

Can someone help me filter the \n of an inputstream?

Edit
Based on what graham said I was able to parse the whole data by removing all the doc types and adding my own start and end tag. So Im not really sure that my problem is not filtering '\n' alone. How can you parse xml that keeps coming like this?

分享到QQ

分享到微博