在 GAEJ 应用程序中解析 C# 客户端生成的 xml 时，尾随部分异常中出现错误内容

发布于 2024-08-19 06:03:27 字数 866 浏览 2 评论 0原文

我正在尝试将可能很大的 xml 块从 C# 客户端发布到 GAEJ 应用程序，然后将其解析为 DOM 文档。

我已经设法让文档生成器通过将请求数据解析为字符串然后修剪它来解析 xml，如下所示：

        String xml;
        BufferedReader rdr = req.getReader();
        String line;
        StringBuilder result = new StringBuilder();
        while ((line = rdr.readLine()) != null) {
            result.append(line);
        }
        xml = result.toString();
        db = dbf.newDocumentBuilder();
        Document doc = db.parse(new InputSource(new StringReader(xml.trim())));

但是，GAEJ 应用程序应该尽可能高效，并通过以下方式将潜在的大 xml 输入读取到字符串行：与将源流提供给解析器相反，这一行似乎很糟糕。我希望以下内容能够工作：

        Document doc = db.parse(request.getInputStream());

但是我总是得到“org.xml.sax.SAXParseException：尾随部分不允许内容”。如果我将 request.getInputStream() 调用的内容转储到控制台，我可以在最终结束标记后看到一些框字符，但我不确定它们是如何到达那里的（客户端使用 UTF-8 编码），或者如何将它们从输入流中删除。谢谢！

原文

I am trying to POST a potentially large chunk of xml from a C# client to a GAEJ app, and then parse it into a DOM document.

I've managed to get the documentbuilder to parse the xml by parsing the request data into a string and then trimming it, as such:

        String xml;
        BufferedReader rdr = req.getReader();
        String line;
        StringBuilder result = new StringBuilder();
        while ((line = rdr.readLine()) != null) {
            result.append(line);
        }
        xml = result.toString();
        db = dbf.newDocumentBuilder();
        Document doc = db.parse(new InputSource(new StringReader(xml.trim())));

However the GAEJ app should be as efficient as possible and reading the potentially large xml input to a string line by line, as opposed to feeding the sourcestream to the parser, seems quite bad. I would like the following to work:

        Document doc = db.parse(request.getInputStream());

But then I always get "org.xml.sax.SAXParseException: Content is not allowed in trailing section." If I dump the contents of the request.getInputStream() call to the console I can see some box-characters after the final closing tag, but I'm not sure how they got there (the clientside is using UTF-8 encoding), or how to remove them from the input stream. Thanks!

分享到QQ

分享到微博