utf8_encode 或解码没有达到我的预期

发布于 2024-09-07 09:43:01 字数 2224 浏览 10 评论 0原文

在写入数据库之前,我正在获取一个 XML 文件并将其读入各种字符串,但是我在处理德语字符时遇到了困难。

XML 文件开始

<?xml version="1.0" encoding="UTF-8"?>

然后我遇到问题的一个例子是这部分

<name><![CDATA[PONS Großwörterbuch Deutsch als Fremdsprache Android]]></name>

我的 PHP 有这个相关部分

$dom = new DOMDocument();
$domNode = $xmlReader->expand();
$element = $dom->appendChild($domNode);
$domString = utf8_encode($dom->saveXML($element));
$product = new SimpleXMLElement($domString);

//read in data
$arr = $product->attributes();
$link_ident = $arr["id"];
$link_id =  $platform . "" . $link_ident;
$link_name = $product->name;

所以 $link_name 变成 PONS Groöwörterbuch Deutsch als Fremdsprache Android

然后我做了一个

$link_name = utf8_decode($link_name);

当我在终端回显时工作正常

PONS GroÃwörterbuch Deutsch als Fremdsprache Android as is now 
PONS Großwörterbuch Deutsch als Fremdsprache Android after utf8decode 

但是当它被写入我的数据库中,显示为:

PONS Kompaktwörterbuch Deutsch-Englisch (Android)

Mysql 中 link_name 的排序规则是 utf8_general_ci

我应该如何执行此操作才能将其正确写入我的数据库?

这是我用来写入数据库的代码

$link_name = utf8_decode($link_name);
$link_id = mysql_real_escape_string($link_id);
$link_name = mysql_real_escape_string($link_name);
$description = mysql_real_escape_string($description);
$metadesc = mysql_real_escape_string($metadesc);
$link_created = mysql_real_escape_string($link_created);
$link_modified = mysql_real_escape_string($link_modified);
$website = mysql_real_escape_string($website);
$cost = mysql_real_escape_string($cost);
$image_name = mysql_real_escape_string($image_name);
$query = "REPLACE into jos_mt_links
(link_id, link_name, alias, link_desc, user_id, link_published,link_approved, metadesc, link_created, link_modified, website, price)
VALUES ('$link_id','$link_name','$link_name','$description','63','1','1','$metadesc','$link_created','$link_modified','$website','$cost')";
echo $link_name . " has been inserted ";

,当我从 shell 运行它时,我看到

PONS Kompaktwörterbuch Deutsch-Englisch (Android) has been inserted

I am taking an XML file and reading it into various strings, before writing to a database, however I am having difficulty with German characters.

The XML file starts off

<?xml version="1.0" encoding="UTF-8"?>

Then an example of where I am having problems is this part

<name><![CDATA[PONS Großwörterbuch Deutsch als Fremdsprache Android]]></name>

My PHP has this relevant section

$dom = new DOMDocument();
$domNode = $xmlReader->expand();
$element = $dom->appendChild($domNode);
$domString = utf8_encode($dom->saveXML($element));
$product = new SimpleXMLElement($domString);

//read in data
$arr = $product->attributes();
$link_ident = $arr["id"];
$link_id =  $platform . "" . $link_ident;
$link_name = $product->name;

So $link_name becomes PONS GroÃwörterbuch Deutsch als Fremdsprache Android

I then did a

$link_name = utf8_decode($link_name);

Which when I echoed back in terminal worked fine

PONS GroÃwörterbuch Deutsch als Fremdsprache Android as is now 
PONS Großwörterbuch Deutsch als Fremdsprache Android after utf8decode 

However when it is written into my database it appears as:

PONS Kompaktwörterbuch Deutsch-Englisch (Android)

The collation for link_name in MysQL is utf8_general_ci

How should I be doing this to get it correctly written into my database?

This is the code I use to write to the database

$link_name = utf8_decode($link_name);
$link_id = mysql_real_escape_string($link_id);
$link_name = mysql_real_escape_string($link_name);
$description = mysql_real_escape_string($description);
$metadesc = mysql_real_escape_string($metadesc);
$link_created = mysql_real_escape_string($link_created);
$link_modified = mysql_real_escape_string($link_modified);
$website = mysql_real_escape_string($website);
$cost = mysql_real_escape_string($cost);
$image_name = mysql_real_escape_string($image_name);
$query = "REPLACE into jos_mt_links
(link_id, link_name, alias, link_desc, user_id, link_published,link_approved, metadesc, link_created, link_modified, website, price)
VALUES ('$link_id','$link_name','$link_name','$description','63','1','1','$metadesc','$link_created','$link_modified','$website','$cost')";
echo $link_name . " has been inserted ";

and when I run it from shell I see

PONS Kompaktwörterbuch Deutsch-Englisch (Android) has been inserted

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦年海沫深 2024-09-14 09:43:01

您从 XML 文件中获得了一个 UTF-8 字符串,并将其放入 UTF-8 数据库中。因此不需要进行编码或解码,只需将原始字符串推入数据库即可。确保您首先使用 mysql_set_charset('utf-8') 告诉数据库有 UTF-8 字符串即将到来。

utf8_decodeutf8_encode 的命名具有误导性。它们用于在 UTF-8 和 ISO-8859-1 编码之间进行转换。调用 utf8_decode(转换为 ISO-8859-1)自然会丢失任何不适合该编码的字符。通常应该避免使用这些函数,除非有特定的地方需要使用 8859-1。

当您回显字符串时,您不应该考虑终端显示的内容是确定的。终端有其自身的编码问题,尤其是在 Windows 下,可能无法正确输出每个字符。在西方 Windows 上安装系统代码页(终端将使用它来将 PHP 吐出的字节转换为字符以在屏幕上显示)将是代码页 1252,这与 ISO-8859-1 类似但不相同。这就是为什么输出 ISO-8859-1 的 utf8_decode 似乎使文本按照您的预期显示。但这没什么用。在内部,您应该对所有字符串使用 UTF-8。

You've got a UTF-8 string from an XML file, and you're putting it into a UTF-8 database. So there is no encoding or decode to be done, just shove the original string into the database. Make sure you've used mysql_set_charset('utf-8') first to tell the database there are UTF-8 strings coming.

utf8_decode and utf8_encode are misleadingly named. They are only for converting between UTF-8 and ISO-8859-1 encodings. Calling utf8_decode, which converts to ISO-8859-1, will naturally lose any characters you have that don't fit in that encoding. You should generally avoid these functions unless there's a specific place where you need to be using 8859-1.

You should not consider what the terminal shows when you echo a string to be definitive. The terminal has its own encoding problems and especially under Windows it is likely to be impossible to output every character properly. On a Western Windows install the system code page (which the terminal will use to turn the bytes PHP spits out into characters to display on-screen) will be code page 1252, which is similar to but not the same as ISO-8859-1. This is why utf8_decode, which spits out ISO-8859-1, appeared to make the text appear as you expected. But that's of little use. Internally you should be using UTF-8 for all strings.

薄荷梦 2024-09-14 09:43:01

在写入数据库之前,必须使用 mb_convert_encoding 或 iconv 函数。

You must use mb_convert_encoding or iconv unction before you write into your database.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文