当前位置：文江博客话题详情

PHP UTF-8 Unicode encoding

对于 PHP 开发人员来说，Unicode 和 UTF-8 哪个更好？

发布于 2024-09-01 07:05:05 字数 121 浏览 3 评论 0原文

对于 PHP 开发人员来说，Unicode 和 UTF-8 哪个更好？

我将创建一个国际 CMS。所以我的客户将遍布世界各地。他们会说所有可能的语言。

什么编码格式更适合浏览器识别和DB数据存储？

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（5）

能否归途做我良人 2024-09-08 07:05:05

“Unicode”不是一种编码。您可能指的是 UTF-8 与 UTF-16（大端或小端）。对于浏览器支持来说确实没有多大关系。任何现代浏览器都支持这三种。您可能会发现 UTF-8 对于您的数据库来说是最节省空间的。

回复收藏 0 原文

眉黛浅 2024-09-08 07:05:05

UTF-8 是 Unicode 的一种编码，是将 Unicode 字符的（抽象）序列表示为（具体）字节序列的一种方式。还有其他编码，例如 UTF-16（同时具有大端和小端变体）。 UTF-8 和 UTF-16 都可以表示 Unicode 中的任何字符，因此无论您选择哪一种，都可以支持所有语言。

如果大部分文本都是西方语言，则 UTF-8 很有用，因为它仅用一个字节表示 ASCII 字符，但对于“外来”字母表中的许多字符（例如中文），每个字符需要三个字节。另一方面，UTF-16 对您可能遇到的所有字符恰好使用两个字节（尽管一些非常深奥的字符，即 Unicode“基本多语言平面”之外的字符，需要四个字节）。

不过，我不建议使用 PHP 来开发国际化软件，因为它并不能真正正确地支持 Unicode。它有一些用于处理 Unicode 编码的附加函数（查看多字节字符串函数），但 PHP 核心将字符串视为字节，而不是字符，因此标准 PHP 字符串函数不适合处理编码为多个字节的字符。例如，如果您对包含字符“大”的 UTF-8 表示形式的字符串调用 PHP 的 strlen()，它将返回 3，因为该字符在 UTF-8 中占用三个字节，即使它只是一个角色。使用像 substr() 这样的字符串分割函数是不稳定的，因为如果你在多字节字符的中间分割，就会破坏字符串。

大多数用于 Web 开发的其他语言，例如 Java、C# 和 Python，都内置了对 Unicode 的支持，因此您可以将任意 Unicode 字符放入字符串中，而无需担心使用哪种编码来表示它们内存，因为从您的角度来看，字符串包含字符，而不是字节。这是一种更安全、更不易出错的处理 Unicode 文本的方法。由于这个原因和其他原因（PHP 并不是一种真正伟大的语言），我建议使用其他语言。

（我读到 PHP 6 将有适当的 Unicode 支持，但目前还不可用。）

UTF-8 is an encoding of Unicode, a way of representing an (abstract) sequence of Unicode characters as a (concrete) sequence of bytes. There are other encodings, such as UTF-16 (which has both big-endian and little-endian variants). Both UTF-8 and UTF-16 can represent any character in Unicode, so you can support all languages regardless of which one you choose.

UTF-8 is useful if most of your text is in Western languages since it represents ASCII characters in just one byte, but it needs three bytes each for many characters in "foreign" alphabets such as Chinese. UTF-16, on the other hand, uses exactly two bytes for all characters you're likely to ever encounter (though some very esoteric characters, those outside Unicode's "Basic Multilingual Plane", require four).

I wouldn't recommend using PHP for developing international software, though, because it doesn't really properly support Unicode. It has some add-on functions for working with Unicode encodings (look at the multibyte string functions), but the the PHP core treats strings as bytes, not characters, so the standard PHP string functions are not suitable for working with characters that are encoded as more than one byte. For example, if you call PHP's strlen() on a string containing the UTF-8 representation of the character "大", it will return 3, because that character takes up three bytes in UTF-8, even though it's only one character. Using string-splitting functions like substr() is precarious because if you split in the middle of a multi-byte character you corrupt the string.

Most other languages used for Web development, such as Java, C#, and Python, have built-in support for Unicode, so that you can put arbitrary Unicode characters into a string and not need to worry about which encoding is used to represent them in memory because from your point of view a string contains characters, not bytes. This is a much safer, less-error-prone way to work with Unicode text. For this and other reasons (PHP isn't really that great a language), I'd recommend using something else.

(I've read that PHP 6 will have proper Unicode support, but that's not available yet.)

回复收藏 0 原文

暖心男生 2024-09-08 07:05:05

UTF-8是一种 Unicode 编码。您的意思可能是要在 UTF-8 和 UTF-16 之间进行选择。

Microsoft 建议

开发人员应该对所有内容使用 UTF-8
他们发送到的 Unicode 数据
从浏览器接收。

对于数据库存储，请使用 RDBMS 更好支持的编码。或者，在其他条件相同的情况下，根据空间效率进行选择。对于英语和大多数欧洲语言，UTF-8 较小，而对于亚洲语言，UTF-16 往往较小。

回复收藏 0 原文

若言繁花未落 2024-09-08 07:05:05

Unicode 是一个标准，它定义了一堆抽象字符（所谓的代码点）及其属性（是数字还是大写字母等）。它还定义了某些编码（用字节表示字符的方法），UTF-8 就是其中之一。请参阅每个软件开发人员绝对必须了解 Unicode 和字符集的绝对最低限度（没有任何借口！）由 Spolsky 了解更多详细信息。

我当然会选择 UTF-8，它是当今所有地方的标准，并且具有一些很好的属性，例如保留所有 7 位 ASCII 字符，这意味着大多数与 HTML 相关的函数，例如 htmlspecialchars 可以直接在 UTF-8 表示上使用，因此留下与编码相关的安全漏洞的机会较小。此外，许多 PHP 函数明确需要 UTF-8 字符串，并且 UTF-8 也比 UTF-16 等替代方案具有更好的文本编辑器支持。

回复收藏 0 原文

傾旎 2024-09-08 07:05:05

最好使用 UTF-8，因为它引用了世界各地所有语言的口音。此外，UTF-8 还具有扩展规定，可以添加更多未使用或已识别的字符。我更喜欢并始终使用 UTF-8 及其系列。

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

0 文章

0 评论

19447 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

qq_E2Iff7

文章 0 评论 0

Archangel

文章 0 评论 0

freedog

文章 0 评论 0

Hunk

文章 0 评论 0

18819270189

文章 0 评论 0

wenkai

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文