android utf-8文件解析

发布于 2024-12-11 18:13:11 字数 1868 浏览 0 评论 0原文

我有一些以 UTF-8 编码的 .xml 文件。但每当我尝试在平板电脑(idea pad、lenovo、android 3.1)上解析它们时,我都会收到相同的错误:

org.xml.SAXParseException: Unexpected token (position: TEXT @1:2 in 
java.io.StringReader@40bdaef8).

这些是引发异常的行:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource inputSource = new InputSource();
inputSource.setCharacterStream(new StringReader(xmlData));
Document doc = db.parse(inputSource); // This line throws exception

这是我的输入:

public String getFromFile(ASerializer aserializer) {
    String filename = aserializer.toLocalResource();
    String data = new String();
    try {
        InputStream stream = _context.getResources().getAssets().open(filename);
        BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
        StringBuilder str = new StringBuilder();
        String line = null;
        while((line = reader.readLine()) != null) {
            str.append(line);
        }
            stream.close();
            data = str.toString();
   }

           catch(Exception e) {
       }
       return data;
    }

XML 文件:

<Results>
    <Result title="08/07/2011">
        <Field title="Company one" value="030589674"/>
        <Field title="Company two" value="081357852"/>
        <Field title="Company three" value="093587125"/>
        <Field title="Company four" value="095608977"/>
    </Result>
    <Result title="11/07/2011">
        <Field title="Company one" value="030589674"/>
        <Field title="Company two" value="081357852"/>
    </Result>
</Results>

我不想转换它们到 ANSI,那么有什么方法可以使 db.parse() 工作吗?

I have some .xml files that are encoded in UTF-8. But whenever I try to parse them on my tablet (idea pad, lenovo, android 3.1), I get the same error:

org.xml.SAXParseException: Unexpected token (position: TEXT @1:2 in 
java.io.StringReader@40bdaef8).

These are the lines that throw the exception:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource inputSource = new InputSource();
inputSource.setCharacterStream(new StringReader(xmlData));
Document doc = db.parse(inputSource); // This line throws exception

Here is my input:

public String getFromFile(ASerializer aserializer) {
    String filename = aserializer.toLocalResource();
    String data = new String();
    try {
        InputStream stream = _context.getResources().getAssets().open(filename);
        BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
        StringBuilder str = new StringBuilder();
        String line = null;
        while((line = reader.readLine()) != null) {
            str.append(line);
        }
            stream.close();
            data = str.toString();
   }

           catch(Exception e) {
       }
       return data;
    }

XML File:

<Results>
    <Result title="08/07/2011">
        <Field title="Company one" value="030589674"/>
        <Field title="Company two" value="081357852"/>
        <Field title="Company three" value="093587125"/>
        <Field title="Company four" value="095608977"/>
    </Result>
    <Result title="11/07/2011">
        <Field title="Company one" value="030589674"/>
        <Field title="Company two" value="081357852"/>
    </Result>
</Results>

I don't want to convert them to ANSI, so is there any way to make the db.parse() work?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

红衣飘飘貌似仙 2024-12-18 18:13:11

在这一行:

BufferedReader reader = new BufferedReader(new InputStreamReader(stream));

您正在使用平台默认编码从 stream 读取内容。这几乎肯定不是您想要的。您需要检查 XML 的实际编码,正确的方法是 有点复杂

幸运的是,每个正常的 XML 解析器(包括 Java/Android 解析器)都可以自己完成这一任务。要让 XML 解析器执行此操作,只需传入 stream 本身,而不是尝试手动读取它。

InputSource inputSource = new InputSource(stream);

At this line:

BufferedReader reader = new BufferedReader(new InputStreamReader(stream));

You're reading from stream using the platform default encoding. That's almost certainly not what you want. You'd need to check the XML for for the actual encoding and the correct way to do that is somewhat complicated.

Luckily, every sane XML parser (including the Java/Android one) can do that on its own. To make the XML parser do that, simply pass in the stream itself instead of trying to read it manually.

InputSource inputSource = new InputSource(stream);
一桥轻雨一伞开 2024-12-18 18:13:11

您很可能使用带有 BOM 标记(字节顺序标记)的 XML 文件。

使用从 BOM 检测编码的 API

或者,预处理文件以便不存在 BOM。

You are quite likely using an XML file with a BOM mark (Byte Order Mark).

Either use an API that detects the encoding from the BOM

Alternatively, preprocess the file so that no BOM is present.

忱杏 2024-12-18 18:13:11

默认情况下,您的 java 字符串采用 UTF-16 编码。如果您无法按照@Joachim Sauer的建议使用InputStream,请尝试以下操作:

Document doc = db.parse(new ByteArrayInputStream(xmlData.getBytes())); 

Your java string is in an UTF-16 encoding be default. If you can't use InputStream as @Joachim Sauer suggested, then try this:

Document doc = db.parse(new ByteArrayInputStream(xmlData.getBytes())); 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文