PHP 文件中的 UTF-8 BOM 签名
我正在编写一些带注释的 PHP 类,我偶然发现了一个问题。我的名字(@author 标签)以 ş
结尾(这是一个 UTF-8 字符,……我知道,这是一个奇怪的名字)。
尽管我将文件保存为 UTF-8,但一些朋友报告说他们看到该字符完全混乱 (È™
)。添加 BOM 签名即可解决此问题。但这件事让我有点困扰,因为除了我在维基百科上看到的以及关于SO的其他一些类似问题之外,我对此了解不多。
我知道它在文件的开头添加了一些内容,据我了解,这并没有那么糟糕,但我很担心,因为我读到的唯一有问题的场景涉及 PHP 文件。由于我正在编写 PHP 类来共享它们,因此 100% 兼容比在评论中包含我的名字更重要。
但我试图理解其中的含义,我应该放心地使用它吗?或者是否存在可能造成损坏的情况?什么时候?
I was writing some commented PHP classes and I stumbled upon a problem. My name (for the @author tag) ends up with a ș
(which is a UTF-8 character, ...and a strange name, I know).
Even though I save the file as UTF-8, some friends reported that they see that character totally messed up (È™
). This problem goes away by adding the BOM signature. But that thing troubles me a bit, since I don't know that much about it, except from what I saw on Wikipedia and on some other similar questions here on SO.
I know that it adds some things at the beginning of the file, and from what I understood it's not that bad, but I'm concerned because the only problematic scenarios I read about involved PHP files. And since I'm writing PHP classes to share them, being 100% compatible is more important than having my name in the comments.
But I'm trying to understand the implications, should I use it without worrying? or are there cases when it might cause damage? When?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
事实上,BOM 是发送到浏览器的实际数据。浏览器会很乐意忽略它,但你仍然无法发送标头。
我相信问题确实出在您和您朋友的编辑器设置上。如果没有 BOM,您朋友的编辑器可能无法自动将该文件识别为 UTF-8。他可以尝试设置他的编辑器,使编辑器期望文件采用 UTF-8(如果您使用真正的 IDE,例如 NetBeans,那么这甚至可以设置为您可以使用的项目设置)可以随代码一起传输)。
另一种方法是尝试一些技巧:一些编辑器尝试根据输入的文本使用一些启发式方法来确定编码。您可以尝试以以下方式启动每个文件
,也许启发式会得到它。那里可能有更好的东西,您可以通过谷歌搜索常见的编码检测启发式方法,或者尝试一下:-)
总而言之,我建议修复编辑器设置。
哦等等,我误读了最后一部分:为了将代码传播到任何地方,我猜你最安全的做法是让所有文件只包含较低的 7 位字符,即纯 ASCII,或者只是接受一些使用古老编辑器的人看到的情况你的名字写得很有趣。没有万无一失的方法。 BOM 肯定是坏的,因为标头已经发送了。另一方面,只要您在注释中只使用 UTF-8 字符,那么某些编辑器误解编码的唯一影响就是奇怪的字符。我会正确拼写你的名字并添加针对启发式的评论,以便大多数编辑者都能理解它,但总会有人看到虚假字符。
Indeed, the BOM is actual data sent to the browser. The browser will happily ignore it, but still you cannot send headers then.
I believe the problem really is your and your friend's editor settings. Without a BOM, your friend's editor may not automatically recognize the file as UTF-8. He can try to set up his editor such that the editor expects a file to be in UTF-8 (if you use a real IDE such as NetBeans, then this can even be made a project setting that you can transfer along with the code).
An alternative is to try some tricks: some editors try to determine the encoding using some heuristics based on the entered text. You could try to start each file with
and maybe the heuristic will get it. There's probably better stuff to put there, and you can either google for what kind of encoding detection heuristics are common, or just try some out :-)
All in all, I recommend just fixing the editor settings.
Oh wait, I misread the last part: for spreading the code to anywhere, I guess you're safest just making all files only contain the lower 7-bit characters, i.e. plain ASCII, or to just accept that some people with ancient editors see your name written funny. There is no fail-safe way. The BOM is definitely bad because of the headers already sent thing. On the other side, as long as you only put UTF-8 characters in comments and so, the only impact of some editor misunderstanding the encoding is weird characters. I'd go for correctly spelling your name and adding a comment targeted at heuristics so that most editors will get it, but there will always be people who'll see bogus chars instead.
BOM 会导致
Headers已经发送
错误,所以,你不能在 PHP 文件中使用 BOMBOM would cause
Headers already sent
error, so, you can't use BOM in PHP files这是一篇旧帖子,已经得到解答,但我可以给您留下一些我在遇到此 BOM 问题时发现的其他资源。
http://people.w3.org/rishida/utils/bomtester/index.php在此页面您可以检查特定文件是否包含 BOM。
还有一个方便的脚本,可以输出当前目录中带有 BOM 的所有文件。
我发现 php.net Dreamweaver 上的代码
也对此有所帮助,它为您提供了保存文件而不包含 BOM 内容的选项
,这是一个较晚的答案,但我仍然希望它有所帮助。
再见
This is an old post and have already been answered, but i can leave you some others resources that i found when i faced with this BOM issue.
http://people.w3.org/rishida/utils/bomtester/index.php with this page you can check if a specific file contains BOM.
There is also a handy script that outputs all files with BOM on your current directory.
I found that code at php.net
Dreamweaver also helps with this, it gives you the option to save the file and not include the BOM stuff
Its a late answer, but i still hope it helps.
Bye
正如您所知,php 中有一个选项,
zend.multibyte
,它允许 php 读取带有 BOM 的文件,而不会给出Headers already sent
错误。从 php.ini 文件:
Just so you know, there's an option in php,
zend.multibyte
, which allows php to read files with BOM without giving theHeaders already sent
error.From the php.ini file:
在 PHP 中,除了“标头已发送”错误之外,BOM 的存在还会以更微妙的方式破坏浏览器中的 HTML。
请参阅UTF-8 BOM 导致的显示问题< /em> 获取问题概要,重点关注 PHP(W3C 国际化)。
发生这种情况时,不仅渲染页面的顶部通常会出现明显的空间,而且如果您在 Firefox 或 Chrome 中检查 HTML,您可能会注意到标题部分是空的,并且其元素似乎位于正文中。
当然,查看源代码将显示其插入位置的所有内容,但浏览器将其解释为正文内容(文本)并将其插入到文档对象模型(DOM)中。
In PHP, in addition to the "headers already sent" error, the presence of a BOM can also screw up the HTML in the browser in more subtle ways.
See Display problems caused by the UTF-8 BOM for an outline of the problem with some focus on PHP (W3C Internationalization).
When this occurs, not only is there usually a noticeable space at the top of the rendered page, but if you inspect the HTML in Firefox or Chrome, you may notice that the head section is empty and its elements appear to be in the body.
Of course viewing source will show everything where it was inserted, but the browser is interpreting it as body content (text) and inserting it there into the Document Object Model (DOM).
或者您可以在 php.ini 中激活输出缓冲,这将解决“标头已发送”问题。如果您的站点负载很大,那么使用输出缓冲来提高性能也非常重要。
Or you could activate output buffering in php.ini which will solve the "headers already sent" problem. It is also very important to use output buffering for performance if your site has significant load.
BOM 实际上是识别 UTF-8 文件的最有效方法,现代浏览器和标准都支持并鼓励在 HTTP 响应正文中使用它。
对于 PHP 文件,它不是文件,而是作为响应发送的生成的输出,因此显然在开始时使用 BOM 保存所有 PHP 文件不是一个好主意,但这并不意味着您不应该使用 BOM在你的回应中。
事实上,您可以在 doctype 声明之前安全地注入以下代码(如果您生成 HTML 作为响应):
(或在 PHP 之前) 7.0.0:
)
进一步阅读:https://www.w3.org/International/questions/qa-byte-order-mark#transcoding
BOM is actually the most efficient way of identifying an UTF-8 file, and both modern browsers and standards support and encourage the use of it in HTTP response bodies.
In case of PHP files its not the file but the generated output that gets sent as response so obviously it's not a good idea to save all PHP files with the BOM at the beginning, but it doesn't mean you shouldn't use the BOM in your response.
You can in fact safely inject the following code right before your doctype declaration (in case you are generating HTML as response):
<?="\u{FEFF}"?>
(or before PHP 7.0.0:<?="\xEF\xBB\xBF"?>
)For further read: https://www.w3.org/International/questions/qa-byte-order-mark#transcoding
添加到 @omabena 答案,使用此代码来查找文件中的 bom 并将其删除。请务必先备份您的文件以防万一。
Adding to @omabena answer use this code to locate and remove bom from your files. Be sure to back up your files first just in case.