当前位置：文江博客话题详情

PHP 文件中的 UTF-8 BOM 签名

发布于 2024-08-27 11:42:12 字数 402 浏览 10 评论 0原文

我正在编写一些带注释的 PHP 类，我偶然发现了一个问题。我的名字（@author 标签）以 ş 结尾（这是一个 UTF-8 字符，……我知道，这是一个奇怪的名字）。

尽管我将文件保存为 UTF-8，但一些朋友报告说他们看到该字符完全混乱 (È™)。添加 BOM 签名即可解决此问题。但这件事让我有点困扰，因为除了我在维基百科上看到的以及关于SO的其他一些类似问题之外，我对此了解不多。

我知道它在文件的开头添加了一些内容，据我了解，这并没有那么糟糕，但我很担心，因为我读到的唯一有问题的场景涉及 PHP 文件。由于我正在编写 PHP 类来共享它们，因此 100% 兼容比在评论中包含我的名字更重要。

但我试图理解其中的含义，我应该放心地使用它吗？或者是否存在可能造成损坏的情况？什么时候？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ゞ花落谁相伴 2024-09-03 11:42:12

事实上，BOM 是发送到浏览器的实际数据。浏览器会很乐意忽略它，但你仍然无法发送标头。

我相信问题确实出在您和您朋友的编辑器设置上。如果没有 BOM，您朋友的编辑器可能无法自动将该文件识别为 UTF-8。他可以尝试设置他的编辑器，使编辑器期望文件采用 UTF-8（如果您使用真正的 IDE，例如 NetBeans，那么这甚至可以设置为您可以使用的项目设置）可以随代码一起传输）。

另一种方法是尝试一些技巧：一些编辑器尝试根据输入的文本使用一些启发式方法来确定编码。您可以尝试以以下方式启动每个文件

<?php //Úτƒ-8 encoded

，也许启发式会得到它。那里可能有更好的东西，您可以通过谷歌搜索常见的编码检测启发式方法，或者尝试一下:-)

总而言之，我建议修复编辑器设置。

哦等等，我误读了最后一部分：为了将代码传播到任何地方，我猜你最安全的做法是让所有文件只包含较低的 7 位字符，即纯 ASCII，或者只是接受一些使用古老编辑器的人看到的情况你的名字写得很有趣。没有万无一失的方法。 BOM 肯定是坏的，因为标头已经发送了。另一方面，只要您在注释中只使用 UTF-8 字符，那么某些编辑器误解编码的唯一影响就是奇怪的字符。我会正确拼写你的名字并添加针对启发式的评论，以便大多数编辑者都能理解它，但总会有人看到虚假字符。

Indeed, the BOM is actual data sent to the browser. The browser will happily ignore it, but still you cannot send headers then.

I believe the problem really is your and your friend's editor settings. Without a BOM, your friend's editor may not automatically recognize the file as UTF-8. He can try to set up his editor such that the editor expects a file to be in UTF-8 (if you use a real IDE such as NetBeans, then this can even be made a project setting that you can transfer along with the code).

An alternative is to try some tricks: some editors try to determine the encoding using some heuristics based on the entered text. You could try to start each file with

<?php //Úτƒ-8 encoded

and maybe the heuristic will get it. There's probably better stuff to put there, and you can either google for what kind of encoding detection heuristics are common, or just try some out :-)

All in all, I recommend just fixing the editor settings.

Oh wait, I misread the last part: for spreading the code to anywhere, I guess you're safest just making all files only contain the lower 7-bit characters, i.e. plain ASCII, or to just accept that some people with ancient editors see your name written funny. There is no fail-safe way. The BOM is definitely bad because of the headers already sent thing. On the other side, as long as you only put UTF-8 characters in comments and so, the only impact of some editor misunderstanding the encoding is weird characters. I'd go for correctly spelling your name and adding a comment targeted at heuristics so that most editors will get it, but there will always be people who'll see bogus chars instead.

回复收藏 0 原文

审判长 2024-09-03 11:42:12

BOM 会导致 Headers已经发送 错误，所以，你不能在 PHP 文件中使用 BOM

回复收藏 0 原文

迷荒 2024-09-03 11:42:12

这是一篇旧帖子，已经得到解答，但我可以给您留下一些我在遇到此 BOM 问题时发现的其他资源。

http://people.w3.org/rishida/utils/bomtester/index.php在此页面您可以检查特定文件是否包含 BOM。

还有一个方便的脚本，可以输出当前目录中带有 BOM 的所有文件。

<?php 
function fopen_utf8 ($filename) { 
    $file = @fopen($filename, "r"); 
    $bom = fread($file, 3); 
    if ($bom != b"\xEF\xBB\xBF") 
    { 
        return false; 
    } 
    else 
    { 
        return true; 
    } 
} 

function file_array($path, $exclude = ".|..|design", $recursive = true) { 
    $path = rtrim($path, "/") . "/"; 
    $folder_handle = opendir($path); 
    $exclude_array = explode("|", $exclude); 
    $result = array(); 
    while(false !== ($filename = readdir($folder_handle))) { 
        if(!in_array(strtolower($filename), $exclude_array)) { 
            if(is_dir($path . $filename . "/")) { 
                                // Need to include full "path" or it's an infinite loop 
                if($recursive) $result[] = file_array($path . $filename . "/", $exclude, true); 
            } else { 
                if ( fopen_utf8($path . $filename) ) 
                { 
                    //$result[] = $filename; 
                    echo ($path . $filename . "<br>"); 
                } 
            } 
        } 
    } 
    return $result; 
} 

$files = file_array("."); 
?>

我发现 php.net Dreamweaver 上的代码

也对此有所帮助，它为您提供了保存文件而不包含 BOM 内容的选项

，这是一个较晚的答案，但我仍然希望它有所帮助。
再见

This is an old post and have already been answered, but i can leave you some others resources that i found when i faced with this BOM issue.

http://people.w3.org/rishida/utils/bomtester/index.php with this page you can check if a specific file contains BOM.

There is also a handy script that outputs all files with BOM on your current directory.

<?php 
function fopen_utf8 ($filename) { 
    $file = @fopen($filename, "r"); 
    $bom = fread($file, 3); 
    if ($bom != b"\xEF\xBB\xBF") 
    { 
        return false; 
    } 
    else 
    { 
        return true; 
    } 
} 

function file_array($path, $exclude = ".|..|design", $recursive = true) { 
    $path = rtrim($path, "/") . "/"; 
    $folder_handle = opendir($path); 
    $exclude_array = explode("|", $exclude); 
    $result = array(); 
    while(false !== ($filename = readdir($folder_handle))) { 
        if(!in_array(strtolower($filename), $exclude_array)) { 
            if(is_dir($path . $filename . "/")) { 
                                // Need to include full "path" or it's an infinite loop 
                if($recursive) $result[] = file_array($path . $filename . "/", $exclude, true); 
            } else { 
                if ( fopen_utf8($path . $filename) ) 
                { 
                    //$result[] = $filename; 
                    echo ($path . $filename . "<br>"); 
                } 
            } 
        } 
    } 
    return $result; 
} 

$files = file_array("."); 
?>

I found that code at php.net

Dreamweaver also helps with this, it gives you the option to save the file and not include the BOM stuff

Its a late answer, but i still hope it helps.
Bye

回复收藏 0 原文

万人眼中万个我 2024-09-03 11:42:12

正如您所知，php 中有一个选项，zend.multibyte，它允许 php 读取带有 BOM 的文件，而不会给出 Headers already sent 错误。

从 php.ini 文件：

; If enabled, scripts may be written in encodings that are incompatible with
; the scanner.  CP936, Big5, CP949 and Shift_JIS are the examples of such
; encodings.  To use this feature, mbstring extension must be enabled.
; Default: Off
;zend.multibyte = Off

Just so you know, there's an option in php, zend.multibyte, which allows php to read files with BOM without giving the Headers already sent error.

From the php.ini file:

; If enabled, scripts may be written in encodings that are incompatible with
; the scanner.  CP936, Big5, CP949 and Shift_JIS are the examples of such
; encodings.  To use this feature, mbstring extension must be enabled.
; Default: Off
;zend.multibyte = Off

回复收藏 0 原文

素年丶 2024-09-03 11:42:12

在 PHP 中，除了“标头已发送”错误之外，BOM 的存在还会以更微妙的方式破坏浏览器中的 HTML。

请参阅UTF-8 BOM 导致的显示问题< /em> 获取问题概要，重点关注 PHP（W3C 国际化）。

发生这种情况时，不仅渲染页面的顶部通常会出现明显的空间，而且如果您在 Firefox 或 Chrome 中检查 HTML，您可能会注意到标题部分是空的，并且其元素似乎位于正文中。

当然，查看源代码将显示其插入位置的所有内容，但浏览器将其解释为正文内容（文本）并将其插入到文档对象模型（DOM）中。

回复收藏 0 原文

淡淡の花香 2024-09-03 11:42:12

或者您可以在 php.ini 中激活输出缓冲，这将解决“标头已发送”问题。如果您的站点负载很大，那么使用输出缓冲来提高性能也非常重要。

回复收藏 0 原文

婴鹅 2024-09-03 11:42:12

BOM 实际上是识别 UTF-8 文件的最有效方法，现代浏览器和标准都支持并鼓励在 HTTP 响应正文中使用它。

对于 PHP 文件，它不是文件，而是作为响应发送的生成的输出，因此显然在开始时使用 BOM 保存所有 PHP 文件不是一个好主意，但这并不意味着您不应该使用 BOM在你的回应中。

事实上，您可以在 doctype 声明之前安全地注入以下代码（如果您生成 HTML 作为响应）：

（或在 PHP 之前） 7.0.0: )

进一步阅读：https://www.w3.org/International/questions/qa-byte-order-mark#transcoding

回复收藏 0 原文

入怼 2024-09-03 11:42:12

添加到 @omabena 答案，使用此代码来查找文件中的 bom 并将其删除。请务必先备份您的文件以防万一。

function fopen_utf8 ($filename) { 
    $file = @fopen($filename, "r"); 
    $bom = fread($file, 3); 
    if ($bom != b"\xEF\xBB\xBF") 
    { 
        return false; 
    } 
    else 
    { 
        return true; 
    } 
} 

function file_array($path, $exclude = ".|..|design", $recursive = true) { 
    $path = rtrim($path, "/") . "/"; 
    $folder_handle = opendir($path); 
    $exclude_array = explode("|", $exclude); 
    $result = array(); 
    while(false !== ($filename = readdir($folder_handle))) { 
        if(!in_array(strtolower($filename), $exclude_array)) { 
            if(is_dir($path . $filename . "/")) { 
                                // Need to include full "path" or it's an infinite loop 
                if($recursive) $result[] = file_array($path . $filename . "/", $exclude, true); 
            } else { 
                if ( fopen_utf8($path . $filename) ) 
                { 
                    //$result[] = $filename; 
                    echo ($path . $filename . "<br>"); 
                    $pathname = $path . $filename; // change the pathname to your target file(s) which you want to remove the BOM.
                    $file_handler = fopen($pathname, "r");
                    $contents = fread($file_handler, filesize($pathname));
                    fclose($file_handler);
                    for ($i = 0; $i < 3; $i++){
                        $bytes[$i] = ord(substr($contents, $i, 1));
                    }
                    if ($bytes[0] == 0xef && $bytes[1] == 0xbb && $bytes[2] == 0xbf){
                        $file_handler = fopen($pathname, "w");
                        fwrite($file_handler, substr($contents, 3));
                        fclose($file_handler);
                        printf("%s BOM removed.<br/>n", $pathname);
                    }
                } 
            } 
        } 
    } 
    return $result; 
} 

$files = file_array(".");

Adding to @omabena answer use this code to locate and remove bom from your files. Be sure to back up your files first just in case.

function fopen_utf8 ($filename) { 
    $file = @fopen($filename, "r"); 
    $bom = fread($file, 3); 
    if ($bom != b"\xEF\xBB\xBF") 
    { 
        return false; 
    } 
    else 
    { 
        return true; 
    } 
} 

function file_array($path, $exclude = ".|..|design", $recursive = true) { 
    $path = rtrim($path, "/") . "/"; 
    $folder_handle = opendir($path); 
    $exclude_array = explode("|", $exclude); 
    $result = array(); 
    while(false !== ($filename = readdir($folder_handle))) { 
        if(!in_array(strtolower($filename), $exclude_array)) { 
            if(is_dir($path . $filename . "/")) { 
                                // Need to include full "path" or it's an infinite loop 
                if($recursive) $result[] = file_array($path . $filename . "/", $exclude, true); 
            } else { 
                if ( fopen_utf8($path . $filename) ) 
                { 
                    //$result[] = $filename; 
                    echo ($path . $filename . "<br>"); 
                    $pathname = $path . $filename; // change the pathname to your target file(s) which you want to remove the BOM.
                    $file_handler = fopen($pathname, "r");
                    $contents = fread($file_handler, filesize($pathname));
                    fclose($file_handler);
                    for ($i = 0; $i < 3; $i++){
                        $bytes[$i] = ord(substr($contents, $i, 1));
                    }
                    if ($bytes[0] == 0xef && $bytes[1] == 0xbb && $bytes[2] == 0xbf){
                        $file_handler = fopen($pathname, "w");
                        fwrite($file_handler, substr($contents, 3));
                        fclose($file_handler);
                        printf("%s BOM removed.<br/>n", $pathname);
                    }
                } 
            } 
        } 
    } 
    return $result; 
} 

$files = file_array(".");

回复收藏 0 原文

~没有更多了~