UTF-8、PHP 和 XML Mysql

发布于 2024-08-12 06:23:02 字数 1568 浏览 2 评论 0 原文

我在解决这个问题时遇到了很大的问题:

我有一个编码 latin1_swedish_ci 的 mysql 数据库和一个存储名称和地址的表。

我正在尝试输出 UTF-8 XML 文件,但以下字符串出现问题:

Otivägen 当我 vim 文件时,它被输出为 Otivägen 。另外,当打开 IE 时,我得到

在文本内容中发现无效字符。处理资源时出错

我有以下代码:

function fixEncoding($in_str)
{
    $cur_encoding = mb_detect_encoding($in_str) ;
    if($cur_encoding == "UTF-8" && mb_check_encoding($in_str,"UTF-8"))
        return $in_str;
    else
        return utf8_encode($in_str);
}

header("Content-type: text/plain;charset=utf-8");
$mystring = "Otivägen" // this is actually obtained from database;

$myxml = "<myxml>
....
     <node>".$mystring."</node>
....
</myxml>
";
$myxml = fixEncoding($myxml);

实际的 XML 输出如下:

<?xml version="1.0" encoding="UTF-8" ?>
<myxml>
    ....
    <node>Otivägen</node>
    ....
</myxml>

任何想法如何输出文件在 vim 中,文件读取的是 Otivägen 而不是 Otivägen

编辑:

我做了mysql_client_encoding()并得到了latin1
然后我做了 mysql_set_charset()
再次运行 mysql_client_encoding() 并得到 utf8,但仍然存在相同的输出问题。

编辑 2

我已登录命令行并运行查询 SELECT address1 FROM address WHERE id = 1000;

SELECT address1 FROM address WHERE id = 1000;
Current database: ftpuser_db

+-------------+
|   address1  |
+-------------+
| Otivägen 32 |
+-------------+
1 row in set (0.06 sec)

提前致谢!

I am having great problems solving this one:

I have a mysql database encoding latin1_swedish_ci and a table that stores names and addresses.

I am trying to output a UTF-8 XML file, but I am having problems with the following string:

Otivägen it is being outputted as Otivägen when i vim the file. Also when opened it IE i get

"An invalid character was found in text content. Error processing resource"

I have the following code:

function fixEncoding($in_str)
{
    $cur_encoding = mb_detect_encoding($in_str) ;
    if($cur_encoding == "UTF-8" && mb_check_encoding($in_str,"UTF-8"))
        return $in_str;
    else
        return utf8_encode($in_str);
}

header("Content-type: text/plain;charset=utf-8");
$mystring = "Otivägen" // this is actually obtained from database;

$myxml = "<myxml>
....
     <node>".$mystring."</node>
....
</myxml>
";
$myxml = fixEncoding($myxml);

The actual XML output is below:

<?xml version="1.0" encoding="UTF-8" ?>
<myxml>
    ....
    <node>Otivägen</node>
    ....
</myxml>

Any ideas how I can output the file so in vim the file reads Otivägen and not Otivägen?

EDIT:

I did mysql_client_encoding() and got latin1
I then did mysql_set_charset()
and again ran mysql_client_encoding() and got utf8, but still the same outputting issues.

Edit 2

I have logged into the command line and run the query SELECT address1 FROM address WHERE id = 1000;

SELECT address1 FROM address WHERE id = 1000;
Current database: ftpuser_db

+-------------+
|   address1  |
+-------------+
| Otivägen 32 |
+-------------+
1 row in set (0.06 sec)

Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

謸气贵蔟 2024-08-19 06:23:02

我认为你做的一切都是正确的,除了你的终端是 Latin-1 之外。

ä 的 UTF-8 序列是 C3 A4,如果显示为 Latin-1,则为 ¤。

I think you did everything correctly, except that your terminal is in Latin-1.

The UTF-8 sequence for ä is C3 A4, which is ä if displayed as Latin-1.

时光倒影 2024-08-19 06:23:02

您的 MySQL 连接编码是否正确设置为 UTF-8

检查 mysql_set_charset()mysql_client_encoding() 了解更多详细信息。

Is your MySQL connection encoding properly set to UTF-8 ?

Check mysql_set_charset() and mysql_client_encoding() for more details.

忆离笙 2024-08-19 06:23:02

噢,孩子。 UTF8 问题可能是一个真正的痛苦,当某些东西为您重新编码时,它们几乎不可能解决。

您确实需要从一端开始并确保每个进程都是 UTF8。这将消除过程中错误解释数据并为您“转换”数据的情况。但重要的是,它还可以让您更轻松地发现某些内容已经为您错误编码了文本(是的,我遇到过这个问题)。

如果表中的 UTF8 数据未设置为 UTF8 并且可能编码错误,则需要在数据重新编码后最后处理表。否则您将无法挽回地损坏您的数据。我也遇到过这样的问题。

第一步:

  • 检查您的终端是否兼容 UTF8。 Gnome 终端是。克特姆是。 ETerm 不是。
  • 检查 shell 中的 LANG 设置。它的值末尾可能应该有 .UTF-8。
  • 检查 vim 是否正确选择 UTF8 设置。您可以使用 :set encoding 进行检查,

这意味着您的文件将以 UTF8 进行编辑。

现在我们检查MySQL。

在 MySQL CLI 中,显示“character_set%”等变量;。结果可能类似于:

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     | 
| character_set_connection | latin1                     | 
| character_set_database   | latin1                     | 
| character_set_filesystem | binary                     | 
| character_set_results    | latin1                     | 
| character_set_server     | latin1                     | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+

您的目标是将所有这些 latin1 值(或您看到的任何值)更改为 utf8

set names utf8; 将更改其中的大部分内容,您可能需要对数据库中的每个新连接执行此操作。这是我在之前的应用程序中不得不采用的解决方案。要更改的其他设置位于 my.cnf 文件中,我需要将您定向到 文档。您不太可能需要将它们全部设置。

我看到您已经设置了输出标头,所以这很好。

现在您可以查看数据库中的数据并了解为什么它是“错误的”。

Oh boy. UTF8 issues can be a real pain and they get almost impossible to solve when something is doing re-encodings for you.

You really need to start at one end and make sure every process is UTF8. That will remove things in the process from interpreting the data wrong and 'converting' it for you. But significantly, it will also let you much more easily spot when something has already mis-encoded text for you (yes, I've had that problem).

And if you have UTF8 data in tables that aren't set to UTF8 and might be mis-encoded, you need to do the tables last, after the data has been re-encoded. Otherwise you will damage your data irretrievably. I've had that problem, too.

First steps:

  • Check your terminal is UTF8 compliant. Gnome-terminal is. Kterm is. ETerm is not.
  • Check your LANG setting in your shell. It should probably have .UTF-8 on the end of it's value.
  • Check that vim is picking up the UTF8 setting correctly. You can check with :set encoding

This will mean that your files will be edited in UTF8.

Now we check MySQL.

In the MySQL CLI, do show variables like 'character_set%';. The results will probably be something like:

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     | 
| character_set_connection | latin1                     | 
| character_set_database   | latin1                     | 
| character_set_filesystem | binary                     | 
| character_set_results    | latin1                     | 
| character_set_server     | latin1                     | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+

What you're aiming for is to change all those latin1 values (or whatever you're seeing) to utf8.

set names utf8; will change most of them and you might need to do that with every new connection in your database. This was the solution I had to adopt in a previous application. The other settings to change are in the my.cnf file for which I need to direct you to the documentation. It is unlikely you will need to set them all.

I see you're already setting the output headers, so that's good.

Now you can look at the data from the database and see why it's "wrong".

痕至 2024-08-19 06:23:02

latin1_swedish_ci 是排序规则,而不是字符集。由于排序规则应该与其字符集匹配,因此表明该表正在使用 latin1,但这并不能保证。

严格来说,表的字符集在这里无关紧要,因为MySql可以转换输入/输出。这就是连接字符集 (mysql_set_charset) 的用途。但是,为了使其正常工作,需要在数据库中对数据进行正确编码。我首先检查数据库中的字符串是否正确。最简单的事情是登录命令行并选择其中包含非 ASCII 字符的行。看起来还好吗?

$mystring = "Otivägen" // this is actually obtained from database;

当心。 $mystring 中数据的编码现在将取决于 php 文件的编码。这可能与数据库中的数据相同,也可能不同。

latin1_swedish_ci is a collation, not a charset. Since collations are supposed to match their charset, it suggests that the table is using latin1, but it's not a guarantee.

Strictly speaking, the charset of tables is irrelevant here, since MySql can convert input/output. That's what the connection charset (mysql_set_charset) is for. However, for that to work properly, the data needs to be encoded properly in the database. I would begin by checking that strings are correct in the database. Simplest thing is to log in on the command line and select a row which has non-ascii characters in it. Does it look OK?

$mystring = "Otivägen" // this is actually obtained from database;

Watch out. The encoding of the data in $mystring will now depend on the encoding of the php file. That may or may not be the same as the data in the database.

念﹏祤嫣 2024-08-19 06:23:02

在输出之前运行查询SET NAMES utf8

在输出之后您可以返回并运行SET NAMES latin1

看看这里, 我也遇到了同样的问题

before output run query SET NAMES utf8

after output you can go back and run SET NAMES latin1

Look here, I've got the same problem

哭泣的笑容 2024-08-19 06:23:02

看来你是“双重编码”Otivägen。如果 Otivägen 已经是 UTF-8,并再次运行 utf8_encode(),您会得到此行为。示例:

$str = "Otivägen"; // already an UTF-8 string
echo utf8_encode($str); // outputs Otivägen

我不确定是否发生了实际的“双重编码”,但这可能是由于编辑器中的设置所致。我的理论。假设您正在运行 Aptana Studio:您的实际字符集设置为 ISO-8859-1(在 Aptana 中,您可以通过右键单击文件并选择“属性”来检查这一点。要为所有项目设置默认字符编码,请选择Aptana 主菜单的首选项 -> 常规 -> 工作区)。如果是这种情况,则实际的 PHP 源文件(其中包含 $myxml 及其字符串 ... 被检测为 ISO-8859 -1,但从数据库接收的 $mystring 是 UTF-8。然后,您的 fixEncoding 函数将运行 else 子句,因为 $myxml 作为一个整体被视为 ISO-8859-1 而不是 UTF-8。这会导致对数据库结果进行双重编码,并且可能是导致您的问题的原因。

在编辑器中检查实际源文件的编码,并验证其是否设置为 UTF-8。或者,尝试对 $myxml 应用或删除 fixEncoding/utf8_encode/utf8_decode。观察结果并了解需要采取哪些措施才能正确实现 Otivägen 的价值。

It seems you are "double encoding" Otivägen. You get this behaviour if Otivägen already is UTF-8, and run utf8_encode() on it again. Example:

$str = "Otivägen"; // already an UTF-8 string
echo utf8_encode($str); // outputs Otivägen

I'm not sure we're the actual "double encoding" occurs, but it may be due to settings in your editor. My theory. Lets say you are running Aptana Studio: Your actual character set is set to ISO-8859-1 (in Aptana, you can check this by right clicking on a file and choose "properties". To set default character encoding for all projects, choose Preferences from Aptana main menu -> General -> workspace). If that's the case, the actual PHP source file where you have $myxml and its string <myxml><node>... is detected to be ISO-8859-1, but $mystring received from the database is UTF-8. Your fixEncoding function would then run the else clause, since the $myxml as a whole is seen as ISO-8859-1 and not UTF-8. This results in double encoding the results from the database, and may be the cause to your problem.

Check the encoding of your actual source file in your editor, and verify that it is set to UTF-8. Alternatively, experiment with applying or removing fixEncoding/utf8_encode/utf8_decode to $myxml. Observe the results and see what needs to be done to the value Otivägen right.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文