从字节数组加载 XMLDocument(可选地包含 BOM 字符)
我在这里看到了几篇关于从某些数据源加载 XML 文档的文章,其中数据具有 Microsoft 专有的 UTF-8 序言(例如, 这个)。
但是,我找不到不涉及手动删除 BOM 字符的优雅(且有效!)的解决方案。
例如,有 这个例子:
byte[] b = System.IO.File.ReadAllBytes("c:\\temp_file_containing_bom.txt");
using (System.IO.MemoryStream oByteStream = new System.IO.MemoryStream(b)) {
using (System.Xml.XmlTextReader oRD = new System.Xml.XmlTextReader(oByteStream)) {
System.Xml.XmlDocument oDoc = new System.Xml.XmlDocument();
oDoc.Load(oRD);
Console.WriteLine(oDoc.OuterXml);
Console.ReadLine();
}
}
...但它仍然不断抛出“无效数据”异常。
我的问题是我有一个巨大的字节数组,有时包含 BOM,有时不包含。我需要将其加载到 XMLDocument 中。而且我不相信我是那个必须照顾“帮助”字节的人。
I've seen several posts here on SO about loading XML documents from some data source where the data has Microsoft's proprietary UTF-8 preamble (for instance, this one).
However, I can't find an elegant (and working!) solution which does not involve striping out BOM characters manually.
For instance, there is this example:
byte[] b = System.IO.File.ReadAllBytes("c:\\temp_file_containing_bom.txt");
using (System.IO.MemoryStream oByteStream = new System.IO.MemoryStream(b)) {
using (System.Xml.XmlTextReader oRD = new System.Xml.XmlTextReader(oByteStream)) {
System.Xml.XmlDocument oDoc = new System.Xml.XmlDocument();
oDoc.Load(oRD);
Console.WriteLine(oDoc.OuterXml);
Console.ReadLine();
}
}
...but it still keeps throwing "invalid data" exception.
My problem is that I have a huge byte array which sometimes contains the BOM and sometimes it does not. I need to load it in XMLDocument. And I don't believe that I am the one who has to take care for the "helper" bytes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
该 BOM 不再是“专有的”。它写在 XML 规范中。只有旧版本的 Java (1.4) 有问题。如果你的 MS 技术爆炸了,那真是太幽默了。
如果第一个字符不是 BOM 序列的第一个字符,则使用缓冲输入流通过推回第一个字符来过滤掉 BOM。
That BOM is no longer 'proprietary'. It's written up in the XML specs. Only old version of Java (1.4) have a problem with it. It's pretty humorous if you've got MS technology exploding.
Use a buffered input stream to filter out the BOM by pushing back the first character if it's not the first character of the BOM sequence.