为什么 org.apache.xerces.parsers.SAXParser 不跳过 utf8 编码的 xml 中的 BOM？

发布于 2024-10-23 00:11:38 字数 280 浏览 11 评论 0原文

我有一个 utf8 编码的 xml。并且此文件包含 BOM 文件的开头。因此，在解析过程中，我面临 org.xml.sax.SAXParseException: Content is not allowed in prolog。我无法从文件中删除这 3 个字节。我无法将文件加载到内存中并在此处删除它们（文件很大）。因此，出于性能原因，我使用 SAX 解析器，并且只想跳过这 3 个字节（如果它们出现在“”标记之前）。我应该为此继承InputStreamReader吗？

我是 java 新手 - 请告诉我正确的方法。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一腔孤↑勇 2024-10-30 00:11:38

这个问题以前曾出现过，我发现当它发生在我身上时，在 Stack Overflow 上回答。链接的答案使用 PushbackInputStream 来测试 BOM。

回复收藏 0 原文

山人契 2024-10-30 00:11:38

我遇到了同样的问题，并用以下代码解决了它：

private static InputStream checkForUtf8BOM(InputStream inputStream) throws IOException {
    PushbackInputStream pushbackInputStream = new PushbackInputStream(new BufferedInputStream(inputStream), 3);
    byte[] bom = new byte[3];
    if (pushbackInputStream.read(bom) != -1) {
        if (!(bom[0] == (byte) 0xEF && bom[1] == (byte) 0xBB && bom[2] == (byte) 0xBF)) {
            pushbackInputStream.unread(bom);
        }
    }
    return pushbackInputStream;
}

I've experienced the same problem and I've solved it with this code:

private static InputStream checkForUtf8BOM(InputStream inputStream) throws IOException {
    PushbackInputStream pushbackInputStream = new PushbackInputStream(new BufferedInputStream(inputStream), 3);
    byte[] bom = new byte[3];
    if (pushbackInputStream.read(bom) != -1) {
        if (!(bom[0] == (byte) 0xEF && bom[1] == (byte) 0xBB && bom[2] == (byte) 0xBF)) {
            pushbackInputStream.unread(bom);
        }
    }
    return pushbackInputStream;
}

回复收藏 0 原文

不打扰别人 2024-10-30 00:11:38

private static char[] UTF32BE = { 0x0000, 0xFEFF };
private static char[] UTF32LE = { 0xFFFE, 0x0000 };
private static char[] UTF16BE = { 0xFEFF };
private static char[] UTF16LE = { 0xFFFE };
private static char[] UTF8 = { 0xEFBB, 0xBF };

private static boolean removeBOM(Reader reader, char[] bom) throws Exception {
    int bomLength = bom.length;
    reader.mark(bomLength);
    char[] possibleBOM = new char[bomLength];
    reader.read(possibleBOM);
    for (int x = 0; x < bomLength; x++) {
        if ((int) bom[x] != (int) possibleBOM[x]) {
            reader.reset();
            return false;
        }
    }
    return true;
}

private static void removeBOM(Reader reader) throws Exception {
    if (removeBOM(reader, UTF32BE)) {
        return;
    }
    if (removeBOM(reader, UTF32LE)) {
        return;
    }
    if (removeBOM(reader, UTF16BE)) {
        return;
    }
    if (removeBOM(reader, UTF16LE)) {
        return;
    }
    if (removeBOM(reader, UTF8)) {
        return;
    }
}

用法：

// xml can be read from a file, url or string through a stream
URL url = new URL("some xml url");
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(url.openStream()));
removeBOM(bufferedReader);

private static char[] UTF32BE = { 0x0000, 0xFEFF };
private static char[] UTF32LE = { 0xFFFE, 0x0000 };
private static char[] UTF16BE = { 0xFEFF };
private static char[] UTF16LE = { 0xFFFE };
private static char[] UTF8 = { 0xEFBB, 0xBF };

private static boolean removeBOM(Reader reader, char[] bom) throws Exception {
    int bomLength = bom.length;
    reader.mark(bomLength);
    char[] possibleBOM = new char[bomLength];
    reader.read(possibleBOM);
    for (int x = 0; x < bomLength; x++) {
        if ((int) bom[x] != (int) possibleBOM[x]) {
            reader.reset();
            return false;
        }
    }
    return true;
}

private static void removeBOM(Reader reader) throws Exception {
    if (removeBOM(reader, UTF32BE)) {
        return;
    }
    if (removeBOM(reader, UTF32LE)) {
        return;
    }
    if (removeBOM(reader, UTF16BE)) {
        return;
    }
    if (removeBOM(reader, UTF16LE)) {
        return;
    }
    if (removeBOM(reader, UTF8)) {
        return;
    }
}

usage:

// xml can be read from a file, url or string through a stream
URL url = new URL("some xml url");
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(url.openStream()));
removeBOM(bufferedReader);

回复收藏 0 原文

~没有更多了~