PHP:如何将外来字符从 simple_html_dom 转换为 UTF8?
我在处理来自包含外来字符的网页的字符串时遇到一些问题。
该字符串是通过使用 str_get_html()
解析网页生成的,后跟 $htmldom->innertext;
(simple_html_dom 类库)。
当我使用 htmlentities()
输出字符串时,它显示得很好;但是在字符串上使用 explode()
并打印各个部分,我得到一个倾斜的块,其中每个外来字符都有一个问号。
我需要将字符串存储在 utf8
MySQL 数据库中,因此我需要正确的外来字符。
我的页面有一个带有 utf8
字符集的标题。
我已经尝试过 mb_split()
和 preg_split()
,但它们也有同样的问题。
I'm having some trouble with a string that comes from a webpage having foreign characters in it.
The string is generated by parsing the webpage using str_get_html()
, followed by $htmldom->innertext;
(simple_html_dom class library).
When I output the string using htmlentities()
it is displayed fine; but using explode()
on the string and printing the parts, I get a tilted block with a question mark in it for each foreign character.
I need to store the string in a utf8
MySQL database, so I need the right foreign characters.
My page has a header with utf8
character set.
I have already tried mb_split()
and preg_split()
, but those have the same problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
PHP 和 UTF-8 并不是一个很好的组合。有些函数可以在 UTF-8 下正常工作,有些则不能,最糟糕的是那些被记录为可以工作但实际上却不能工作的函数(例如 DOMDocument )。
您可以使用
mb_convert_encoding()
将多字节字符转换为 HTML 实体,这通常提供可接受的解决方法:PHP and UTF-8 isn't a very good combination. Some functions work fine with UTF-8, others don't, and the worst are those that are documented to work, but in fact do not (such as DOMDocument ).
You can use
mb_convert_encoding()
to convert multibyte characters to HTML entities, which usually provides an acceptable workaround:我解决了这个问题:
https://github.com/neitanod/forceutf8
它有一个很棒的功能,可以将任何内容转换为 utf- 8,无论它来自什么来源(只要它已经是 Latin1 (iso 8859-1)、Windows-1252 或 UTF8,或它们的混合)。
非常感谢塞巴斯蒂安·格里尼奥利。
I solved the issue with :
https://github.com/neitanod/forceutf8
It has a great function that just converts anything to utf-8, no matter what source it's from (as long as it comes in Latin1 (iso 8859-1), Windows-1252 or UTF8 already, or a mix of them).
Many thanks go to Sebastian Grignoli.