NSXMLParser 和 BOM 字节

发布于 2024-08-17 20:36:33 字数 659 浏览 11 评论 0原文

我通过某个服务器的 php 查询得到了我的 xml 文件。当我将结果数据打印到控制台时,我得到了结构良好的 xml 文件。当我尝试使用 NSXMLParser 解析它时,它返回 NSXMLParserErrorDomain ,代码为 4 - 空文档。 我看到它无法解析的 xml 在关闭“>”后立即具有 BOM(字节顺序标记)序列xml 标头的标记。问题是如何摆脱 BOM 序列。我尝试用这样的 BOM 字节创建一个字符串:

    const   UInt8 bom[3] = {0xEF, 0xBB, 0xBF};
NSString    *bomString = [[NSString alloc] initWithData:[NSData dataWithBytes:(const void *)bom length:3] encoding:NSUTF8StringEncoding];
NSString    *noBOMString = [theResult stringByReplacingOccurrencesOfString:bomString withString:@" "];

但由于某种原因它不起作用。有些 xml 在根元素之后具有此序列。在这种情况下,NSXMLParser 成功解析 xml。 Safari 会忽略这些字符。所以Xcode调试器。请帮忙!

谢谢,

纳瓦

I'm getting my xml file as a result of a php query from some server. When I print the resulting data to the console I'm getting well-structured xml file. When I try to parse it using NSXMLParser it returns NSXMLParserErrorDomain with code 4 - empty document.
I saw that xmls that it couldn't parse have BOM (Byte order mark) sequence right after closing '>' mark of xml header. The question is how to get rid of BOM sequence. I tried to create a string with those BOM bytes like that:

    const   UInt8 bom[3] = {0xEF, 0xBB, 0xBF};
NSString    *bomString = [[NSString alloc] initWithData:[NSData dataWithBytes:(const void *)bom length:3] encoding:NSUTF8StringEncoding];
NSString    *noBOMString = [theResult stringByReplacingOccurrencesOfString:bomString withString:@" "];

but it doesn't work for some reason. There are xmls, that have this sequence after the root element. In this case NSXMLParser parses the xml successfully. Safari ignores those characters. So Xcode debugger. Please help!

Thanks,

Nava

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

绿萝 2024-08-24 20:36:33

我尝试使用这些 BOM 字节创建一个字符串,如下所示:

const UInt8 bom[3] = {0xEF, 0xBB, 0xBF};
NSString *bomString = [[NSString alloc] initWithData:[NSData dataWithBytes:(const void *)bom 长度:3] 编码:NSUTF8StringEncoding];
NSString *noBOMString = [theResult stringByReplacingOccurrencesOfString:bomString withString:@" "];

但由于某种原因它不起作用。

确保在实例化 noBOMString 时提供了正确的编码。如果文档数据是 UTF-8,请确保将字符串实例化为 UTF-8。同样,如果数据是 UTF-16,请确保将字符串实例化为 UTF-16。

如果您传递了错误的编码,则字符串根本不会实例化(我假设这不是您的问题),或者某些字符将是错误的。 BOM 将是其中之一:如果输入是 UTF-8 并且您将其解释为 MacRoman 或 ISOLatin1,它将在字符串中显示为三个单独的字符。这三个单独的字符不会与 BOM 的单个字符相等。

I tried to create a string with those BOM bytes like that:

const   UInt8 bom[3] = {0xEF, 0xBB, 0xBF};
NSString    *bomString = [[NSString alloc] initWithData:[NSData dataWithBytes:(const void *)bom length:3] encoding:NSUTF8StringEncoding];
NSString    *noBOMString = [theResult stringByReplacingOccurrencesOfString:bomString withString:@" "];

but it doesn't work for some reason.

Make sure you gave the correct encoding when instantiating noBOMString. If the document data was UTF-8, make sure you instantiated the string as UTF-8. Likewise, if the data was UTF-16, make sure you instantiated the string as UTF-16.

If you pass the wrong encoding, either the string won't instantiate at all (I'm assuming that isn't your problem) or some characters will be wrong. The BOM would be one of these: If the input is UTF-8 and you interpret it as MacRoman or ISOLatin1, it'll appear in the string as three separate characters. These three separate characters won't compare equal to the single character that is the BOM.

执着的年纪 2024-08-24 20:36:33

我不确定这就是问题所在。我有过非常类似的经历,文件被编码为 UTF-8,但 xml 标头声称它是 UTF-16。

由于不匹配,我无法解析它并出现与您相同的错误。但是,将 xml 标头从 UTF-16 更改为 UTF-8 解决了我的问题。

您可能遇到类似的问题。

I'm not certain that this is the issue. I've had a very similar experiance where the file was encoded as UTF-8, but the xml header claimed it to be UTF-16.

As a result of the mismatch I was unable to parse it with the same error you had. However, changing the xml header from UTF-16 to UTF-8 fixed my issue for me.

You may be experiencing a similar issue.

思念绕指尖 2024-08-24 20:36:33

好吧,这可能不是摆脱 BOM 字节的最佳方法,但它确实有效。对于那些像我一样花了几个小时试图让 NSXMLParser 吞下 BOM 的人:
假定您通过 NSURLConnection 获取数据并将其存储在 NSMutableData *webData 中。

    const char bom[3] = {0xEF, 0xBB, 0xBF};

char *data = [webData mutableBytes];
char *cp = data, *pp;
long lessBom = 0;
do {
    cp = strstr((const char *)cp, (const char *)bom);
    if (cp) {
        pp = cp;
        cp += 3;
        memcpy(pp, cp, strlen(cp));
        lessBom += 3;
    }
} while (cp != NULL);

NSMutableData   *newData = [[NSMutableData alloc] initWithBytes:data length:webData.length - lessBom];

然后你用 newData 创建你的解析器,它就可以工作了!我很高兴收到对此代码的任何评论/改进

Well, may be this is not the best approach to get rid of BOM bytes, but it works. For those who spent hours like me trying to make NSXMLParser to swallow BOMs:
Given, that you get your data through NSURLConnection and store it in NSMutableData *webData.

    const char bom[3] = {0xEF, 0xBB, 0xBF};

char *data = [webData mutableBytes];
char *cp = data, *pp;
long lessBom = 0;
do {
    cp = strstr((const char *)cp, (const char *)bom);
    if (cp) {
        pp = cp;
        cp += 3;
        memcpy(pp, cp, strlen(cp));
        lessBom += 3;
    }
} while (cp != NULL);

NSMutableData   *newData = [[NSMutableData alloc] initWithBytes:data length:webData.length - lessBom];

Then you create your parser with newData and it JUST WORKS! I'll be glad to get any comments/improvements to this code

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文