这里需要字节顺序标记吗?

发布于 2024-10-17 08:31:20 字数 93 浏览 6 评论 0原文

我正在通过 php 生成一个 csv 文件,以便通过浏览器下载。 考虑到要使用的目标系统可以是 mac、unix、windows 等,我是否需要在开头插入字节顺序标记字节?

I am generating a csv file through php to be downloaded through the browser.
Do I need to insert the byte order mark bytes in the beginning, considering the target system that would be used can be a mac,unix , windows , etc ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

凑诗 2024-10-24 08:31:20

不,您不需要这样做。

字节顺序标记用于某些 Unicode 编码,即 UTF-8、UTF-16 和 UTF-32 中,以确定该编码是否确实是 Unicode。

在 UTF-16 中,它用于区分 UTF-16 和 UCS-2(UTF-16 的子集)。

它在 UTF-8 和 UTF-32 中是可选的,但有效。但是,在 UTF-8 中,它可能会导致兼容性问题。引用措辞良好的维基百科条目

是否与现有的兼容
程序不重要,BOM
可用于识别文件是否是
UTF-8 与传统编码相比,但是
这仍然是一个问题,因为许多
添加 BOM 的实例或
删除但没有实际改变
编码,或者各种编码
连接在一起。检查是否
文本有效 UTF-8 更可靠
而不是使用 BOM。

出于这些原因,我反对在 UTF-8 中使用 BOM。

No, you are not required to.

Byte Order Mark is used in some Unicode encodings, namely UTF-8, UTF-16 and UTF-32 to determine that the encoding is really Unicode.

In UTF-16, it is used to differentiate UTF-16 from UCS-2 (a subset of UTF-16).

It is optional in UTF-8 and UTF-32, but valid. However, in UTF-8, it can cause compatibility issues. To quote a well-phrased Wikipedia entry:

If compatibility with existing
programs is not important, the BOM
could be used to identify if a file is
in UTF-8 versus a legacy encoding, but
this is still problematic, due to many
instances where the BOM is added or
removed without actually changing the
encoding, or various encodings are
concatenated together. Checking if the
text is valid UTF-8 is more reliable
than using BOM.

I would go against using the BOM in UTF-8 for those reasons.

好多鱼好多余 2024-10-24 08:31:20

关于最初的问题,这实际上取决于文件写入时的编码方式。如果它是 utf-8 编码,我会添加 BOM。如果文件中只有 ASCII 字符,则 BOM 可能不存在,因为不会有序列。然而,如果 utf-8 序列位于文件内,则可以更轻松地检测 BOM,从而遍历整个文件并检查有效序列。即使您检测到单个序列,它仍然可能是 0x7F 以上的单个字符。

Concerning the original question, it is really up to the way that file is encoded when written. If it will be utf-8 encoded i'd add the BOM. If there are just ASCII characters within the file, the BOM can be absent because there will be no sequences. If however utf-8 sequences are within the file, it will be more easy to detect the BOM as to walk through the whole file and check for valid sequences. And even if you detect a single sequence, it still might be single characters above 0x7F.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文