如何获取编码为 'ō' 的字符的字形使用 php 从 utf-8 编码的数据库字段?
我有一个 MySQL 数据库表,其排序规则为“utf8_general_ci”,字段中的值为:
x & #299; bán yá wén(没有空格)。
当它被转换时(例如通过 StackOverflow 的编辑器),它看起来像这样:
xī bán yá wén
其中第二个字符看起来像一个小写的 i,顶部有一个横线。
在 PHP 中,什么函数将 & 转换为#299;实体转换成 ī 字符?
我尝试过使用 html_entity_decode($str,ENT_COMPAT,'UTF-8'),但是我得到如下字符:
yän wén 或 zhÅ•ng wén
我很确定有一些我不明白的内容解码,这就是我使用错误函数的原因。任何人都可以阐明如何获取由实体 & 表示的单个字符字形吗? #299 以及 255 以上的类似高数字字符?
非常感谢, AE
I have a MySQL database table with a collation of 'utf8_general_ci' and the value in the field is:
x & #299; bán yá wén (without the spaces).
When this is converted (for example by StackOverflow's editor) it looks like this:
xī bán yá wén
where the second character looks like a lower case i with a bar over the top.
In PHP, what function converts the & #299 ; entity into the ī character?
I've tried using html_entity_decode($str,ENT_COMPAT,'UTF-8'), however I get characters like the following:
yÄ«n wén or zhÅ•ng wén
I'm pretty sure there's something I don't understand about the decoding, which is why I'm using the wrong function. Can anyone shed some light on how to get the single character glyph that's represented by the entity & #299 and similar high-number characters above 255?
Many thanks,
AE
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
UTF-8 是一种多字节编码。因此,如果您通过单字节编码(例如 Latin-1)查看它,您将看到与您所看到的结果非常相似的内容。将文档编码设置为 UTF-8 以查看实际字符。
至于你的第一个问题,实际上是浏览器在解码字符引用并打印字符,而不是 PHP。
UTF-8 is a multibyte encoding. As such if you look at it through a single-byte encoding such as Latin-1 you'll see something much like the results you're seeing. Set the document encoding to UTF-8 to see the actual character.
As for your first question, it's actually the browser that's decoding the character reference and printing the character, not PHP.
我建议您仔细阅读此页面:适用于工作 PHP 程序员的 Unicode 。它不长,应该可以帮助您克服困难并自信地使用 Unicode 和 UTF-8。
一旦您对这些内容感到满意,请查看 mbstring 并intl PHP 扩展,非常方便。并了解 PHP 中的哪些字符串函数对于多字节字符串使用是安全的,哪些是不安全的。这是注释我在将网站转换为 UTF-8 时制作了,其中包含一系列顽皮的字符串函数。
I suggest you read through this page: Unicode for the working PHP programmer. It is not long and it should get you over the hump and into confident Unicode and UTF-8.
Once you're OK with that stuff, check out the mbstring and intl PHP extensions, which are very handy. And know which string functions in PHP are and are not safe to use on multibyte strings. Here's the notes I made when I was transitioning a site to UTF-8 which includes a list of naughty string functions.