尝试使用 UTF-8 字符串时 PHP htmlspecialchars() 函数出错
我做了以下事情:
- 我有一个包含数据的电子表格。其中一行中有一个 ü 字符。
- 我将其保存为 OpenOffice.org 中的 CSV 文件。当它要求我输入字符编码时,我选择 UTF-8。
- 我使用 Navicat 创建一个 MySQL 数据库表、InnoDB,采用 UTF-8 utf8_general 编码并导入 CSV。
- 我尝试使用 PHP 函数
htmlspecialchars($string, ENT_COMPAT, 'UTF-8')
,其中$string
是包含特殊 ü 字符的字符串。
它给了我一个错误:参数中的多字节序列无效。当我将 'UTF-8'
更改为 'ISO8859-1'
时,不会引发错误,但显示不正确的字符。 (“未知字符”字符,看起来像 )
如果我使用 HTML 表单更新数据库中的字符串,错误就会消失并且字符会正确显示,但是,当然后我在 Navicat 中查看记录,它看起来有两个字符:
[1/4][A 上面有一些东西]
一些多字节不被视为一个字符。`
什么是发生了什么事,哪里出了问题,我该怎么办?
I did the following things:
- I have a spreadsheet with data. One of the rows has a ü character in it.
- I save this as a CSV file in OpenOffice.org. When it asks me for a character encoding, I choose UTF-8.
- I use Navicat to create a MySQL database table, InnoDB with UTF-8 utf8_general encoding and import the CSV.
- I try to use PHP function
htmlspecialchars($string, ENT_COMPAT, 'UTF-8')
where$string
is the string containing the special ü character.
It gives me an error: Invalid multibyte sequence in argument. When I change 'UTF-8'
with 'ISO8859-1'
, no error is thrown, but the incorrect character is shown. (The 'unknown character' character, looks like <?>
)
If I use an HTML form to update the string in the database, the error disappears and the character is displayed correctly, however, when I then look at the record in Navicat, it looks two characters:
[1/4][A with some thing on top of it]
Some multibyte that isn't seen as one character.`
What is going on, where are things going wrong, and what can I do about it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
虽然我不明白“无效的多字节”错误从何而来,但我很确定
htmlspecialchars()
是 不是你的罪魁祸首:根据我的理解,
htmlspecialchars()
对于 UTF-8 字符串应该可以正常工作,无需指定字符集。我敢打赌,包含表单的 HTML 页面或您使用的数据库连接都不是 UTF-8 编码的。对于后者,请尝试在执行插入之前向 mySQL 发送 a 。
Although I don't understand where the "invalid multibyte" error comes from, I'm pretty sure
htmlspecialchars()
is not your culprit:In my understanding,
htmlspecialchars()
should work fine for a UTF-8 string without specifying a character set. My bet would be that either the HTML page containing the form, or the database connection you use is not UTF-8 encoded. For the latter, try sending ato mySQL before doing the insert.