这里需要字节顺序标记吗?
我正在通过 php 生成一个 csv 文件,以便通过浏览器下载。 考虑到要使用的目标系统可以是 mac、unix、windows 等,我是否需要在开头插入字节顺序标记字节?
I am generating a csv file through php to be downloaded through the browser.
Do I need to insert the byte order mark bytes in the beginning, considering the target system that would be used can be a mac,unix , windows , etc ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不,您不需要这样做。
字节顺序标记用于某些 Unicode 编码,即 UTF-8、UTF-16 和 UTF-32 中,以确定该编码是否确实是 Unicode。
在 UTF-16 中,它用于区分 UTF-16 和 UCS-2(UTF-16 的子集)。
它在 UTF-8 和 UTF-32 中是可选的,但有效。但是,在 UTF-8 中,它可能会导致兼容性问题。引用措辞良好的维基百科条目:
出于这些原因,我反对在 UTF-8 中使用 BOM。
No, you are not required to.
Byte Order Mark is used in some Unicode encodings, namely UTF-8, UTF-16 and UTF-32 to determine that the encoding is really Unicode.
In UTF-16, it is used to differentiate UTF-16 from UCS-2 (a subset of UTF-16).
It is optional in UTF-8 and UTF-32, but valid. However, in UTF-8, it can cause compatibility issues. To quote a well-phrased Wikipedia entry:
I would go against using the BOM in UTF-8 for those reasons.
关于最初的问题,这实际上取决于文件写入时的编码方式。如果它是 utf-8 编码,我会添加 BOM。如果文件中只有 ASCII 字符,则 BOM 可能不存在,因为不会有序列。然而,如果 utf-8 序列位于文件内,则可以更轻松地检测 BOM,从而遍历整个文件并检查有效序列。即使您检测到单个序列,它仍然可能是 0x7F 以上的单个字符。
Concerning the original question, it is really up to the way that file is encoded when written. If it will be utf-8 encoded i'd add the BOM. If there are just ASCII characters within the file, the BOM can be absent because there will be no sequences. If however utf-8 sequences are within the file, it will be more easy to detect the BOM as to walk through the whole file and check for valid sequences. And even if you detect a single sequence, it still might be single characters above 0x7F.