PHP:将“”转换为“”时出现问题从 ISO-8859-1 到 UTF-8 的字符

发布于 2024-09-19 12:31:54 字数 1053 浏览 8 评论 0原文

我在使用 PHP 将 ISO-8859-1 数据库内容转换为 UTF-8 时遇到一些问题。我正在运行以下代码来测试:

// Connect to a latin1 charset database 
// and retrieve "Georgia O’Keeffe", which contains a "’" character
$connection = mysql_connect('*****', '*****', '*****');
mysql_select_db('*****', $connection);
mysql_set_charset('latin1', $connection);
$result = mysql_query('SELECT notes FROM categories WHERE id = 16', $connection);
$latin1Str = mysql_result($result, 0);
$latin1Str = substr($latin1Str, strpos($latin1Str, 'Georgia'), 16);

// Try to convert it to UTF-8
$utf8Str = iconv('ISO-8859-1', 'UTF-8', $latin1Str);

// Output both
var_dump($latin1Str);
var_dump($utf8Str);

当我在 Firefox 的源代码视图中运行此代码时,确保 Firefox 的编码设置设置为“Western (ISO-8859-1)”,我得到:

asd

到目前为止,一切顺利。第一个输出包含那个奇怪的引用,我可以正确地看到它,因为它在 ISO-8859-1 中,Firefox 也是如此。

当我将 Firefox 的编码设置更改为“UTF-8”后,它看起来像这样:

asd

引用去了哪里? iconv() 不是应该将其转换为 UTF-8 吗?

I'm having some issues with using PHP to convert ISO-8859-1 database content to UTF-8. I am running the following code to test:

// Connect to a latin1 charset database 
// and retrieve "Georgia O’Keeffe", which contains a "’" character
$connection = mysql_connect('*****', '*****', '*****');
mysql_select_db('*****', $connection);
mysql_set_charset('latin1', $connection);
$result = mysql_query('SELECT notes FROM categories WHERE id = 16', $connection);
$latin1Str = mysql_result($result, 0);
$latin1Str = substr($latin1Str, strpos($latin1Str, 'Georgia'), 16);

// Try to convert it to UTF-8
$utf8Str = iconv('ISO-8859-1', 'UTF-8', $latin1Str);

// Output both
var_dump($latin1Str);
var_dump($utf8Str);

When I run this in Firefox's source view, making sure Firefox's encoding setting is set to "Western (ISO-8859-1)", I get this:

asd

So far, so good. The first output contains that weird quote and I can see it correctly because it's in ISO-8859-1 and so is Firefox.

After I change Firefox's encoding setting to "UTF-8", it looks like this:

asd

Where did the quote go? Wasn't iconv() supposed to convert it to UTF-8?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

雨的味道风的声音 2024-09-26 12:31:54

U+2019 右单引号不是 ISO-8859-1 中的字符。它是 windows-1252 中的一个字符,如 0x92。实际的 ISO-8859-1 字符 0x92 是一个很少使用的 C1 控制字符,称为“私人使用” 2”。

错误标签很常见
Windows-1252 文本数据
字符集标签 ISO-8859-1。许多网络
浏览器和电子邮件客户端将
MIME 字符集 ISO-8859-1 为
Windows-1252 字符以便
适应这种错误标签,但它是
不标准的行为和护理应该
应采取措施避免产生这些
ISO-8859-1 中标记的字符
内容。

看来这就是这里正在发生的事情。将“ISO-8859-1”更改为“windows-1252”。

U+2019 RIGHT SINGLE QUOTATION MARK is not a character in ISO-8859-1. It is a character in windows-1252, as 0x92. The actual ISO-8859-1 character 0x92 is a rarely-used C1 control character called "Private Use 2".

It is very common to mislabel
Windows-1252 text data with the
charset label ISO-8859-1. Many web
browsers and e-mail clients treat the
MIME charset ISO-8859-1 as
Windows-1252 characters in order to
accommodate such mislabeling but it is
not standard behaviour and care should
be taken to avoid generating these
characters in ISO-8859-1 labeled
content.

It appears that this is what's happening here. Change "ISO-8859-1" to "windows-1252".

西瑶 2024-09-26 12:31:54

假设您的页面标题 charset 是 utf-8,这将解决您的问题:

// Opens a connection to a MySQL server
$connection = mysql_connect ($server, $username, $password);
$charset = mysql_client_encoding($connection);
$flagChange = mysql_set_charset('utf8', $connection);
echo "The character set is: $charset</br>mysql_set_charset result:$flagChange</br>";

this will solve your problem, supposing that your page header charset is utf-8:

// Opens a connection to a MySQL server
$connection = mysql_connect ($server, $username, $password);
$charset = mysql_client_encoding($connection);
$flagChange = mysql_set_charset('utf8', $connection);
echo "The character set is: $charset</br>mysql_set_charset result:$flagChange</br>";
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文