以 UTF-8 格式保存所有源代码文件有什么缺点吗?

发布于 2024-10-30 15:08:24 字数 34 浏览 5 评论 0原文

如果相关的话(很可能是),它们是 PHP 源代码文件。

If that's relevant (it very well could be), they are PHP source code files.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

眸中客 2024-11-06 15:08:24

有一些陷阱需要注意:

  1. PHP 不知道某些编辑器或 IDE 喜欢放在 UTF-8 文件开头的 BOM 字符。该字符表明该文件是UTF-8,但不是必需的,并且它是不可见的。这可能会导致处理 HTTP 标头的函数发出“标头已发送出去”警告,因为 PHP 会在看到 BOM 时将 BOM 输出到浏览器,这将阻止您发送任何标头。确保您的文本编辑器具有 UTF-8(无 BOM) 编码;如果您不确定,只需进行测试即可。如果空文件开头的 不会触发警告,那就没问题。
  2. 默认字符串函数不支持多字节编码。这意味着 strlen 真正返回字符串中的字节数,而不是实际的字符数。在您开始使用 substr 之类的函数拼接非 ASCII 字符的字符串之前,这并不是什么大问题:当您这样做时,传递给它的索引引用字节索引而不是字符索引,并且这可能会导致您的脚本将非 ASCII 字符分成两部分。例如,echo substr("é", 0, 1) 将返回无效的 UTF-8 字符,因为在 UTF-8 中,é 实际上占用两个字节,而 substr 将返回一个无效的 UTF-8 字符。仅返回第一个。 (解决方案是使用 mb_ 字符串函数 ,它们支持多字节编码。)
  3. 您必须确保您的数据源(如外部文本文件或数据库)也返回 UTF-8 字符串,因为 PHP 不会进行自动转换。为此,您可以使用特定于实现的方法(例如,MySQL 有一个特殊的查询,可让您指定期望结果的编码:SET CHARACTER SET UTF8 或类似的内容),或者,如果您找不到更好的方法,mb_convert_encodingiconv 会将一个字符串转换为另一种编码。

There are a few pitfalls to take care of:

  1. PHP is not aware of the BOM character certain editors or IDEs like to put at the very beginning of UTF-8 files. This character indicates the file is UTF-8, but it is not necessary, and it is invisible. This can cause "headers already sent out" warnings from functions that deal with HTTP headers because PHP will output the BOM to the browser if it sees one, and that will prevent you from sending any header. Make sure your text editor has a UTF-8 (No BOM) encoding; if you're not sure, simply do the test. If <?php header('Content-Type: text/html') ?> at the beginning of an otherwise empty file doesn't trigger a warning, you're fine.
  2. Default string functions are not multibyte encodings-aware. This means that strlen really returns the number of bytes in the string, not the actual number of characters. This isn't too much of a problem until you start splicing strings of non-ASCII characters with functions like substr: when you do, indices you pass to it refer to byte indices rather than character indices, and this can cause your script to break non-ASCII characters in two. For instance, echo substr("é", 0, 1) will return an invalid UTF-8 character because in UTF-8, é actually takes two bytes and substr will return only the first one. (The solution is to use the mb_ string functions, which are aware of multibyte encodings.)
  3. You must ensure that your data sources (like external text files or databases) return UTF-8 strings too, because PHP makes no automagic conversion. To that end, you may use implementation-specific means (for instance, MySQL has a special query that lets you specify in which encoding you expect the result: SET CHARACTER SET UTF8 or something along these lines), or if you couldn't find a better way, mb_convert_encoding or iconv will convert one string into another encoding.
椵侞 2024-11-06 15:08:24

实际上,通常建议您将所有源保留为 UTF8。带有拉丁字符的常规代码的大小根本不重要,但可以防止任何特殊字符出现故障。

It's actually usually recommended that you keep all sources in UTF8. It won't matter size of regular code with latin characters at all, but will prevent glitches with any special characters.

紫﹏色ふ单纯 2024-11-06 15:08:24

如果您在字符串值等中使用任何特殊字符,则大小会稍大一些,但这并不重要。

尽管如此,我的建议是始终保留默认格式。我花了很多时间,因为格式保存出错,所有字符都改变了。

从技术角度来看,没有什么区别!

If you are using any special chars in e.g string values, the size is a little bit bigger, but that shouldn't matter.

Nevertheless my suggestion is, to always leave the default format. I spent so many hours because there was an error with the format saving and all characters changed.

From a technical point of few, there isn't a difference!

七七 2024-11-06 15:08:24

非常相关的是,PHP 解析器可能会开始输出虚假字符,例如一个时髦的倒置问号。只需遵守规范即可,这是首选。

Very relevant, the PHP parser may start to output spurious characters, like a funky unside-down questionmark. Just stick to the norm, much preferred.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文