utf8_encode 或解码没有达到我的预期
在写入数据库之前,我正在获取一个 XML 文件并将其读入各种字符串,但是我在处理德语字符时遇到了困难。
XML 文件开始
<?xml version="1.0" encoding="UTF-8"?>
然后我遇到问题的一个例子是这部分
<name><![CDATA[PONS Großwörterbuch Deutsch als Fremdsprache Android]]></name>
我的 PHP 有这个相关部分
$dom = new DOMDocument();
$domNode = $xmlReader->expand();
$element = $dom->appendChild($domNode);
$domString = utf8_encode($dom->saveXML($element));
$product = new SimpleXMLElement($domString);
//read in data
$arr = $product->attributes();
$link_ident = $arr["id"];
$link_id = $platform . "" . $link_ident;
$link_name = $product->name;
所以 $link_name 变成 PONS Groöwörterbuch Deutsch als Fremdsprache Android
然后我做了一个
$link_name = utf8_decode($link_name);
当我在终端回显时工作正常
PONS GroÃwörterbuch Deutsch als Fremdsprache Android as is now
PONS Großwörterbuch Deutsch als Fremdsprache Android after utf8decode
但是当它被写入我的数据库中,显示为:
PONS Kompaktwörterbuch Deutsch-Englisch (Android)
Mysql 中 link_name 的排序规则是 utf8_general_ci
我应该如何执行此操作才能将其正确写入我的数据库?
这是我用来写入数据库的代码
$link_name = utf8_decode($link_name);
$link_id = mysql_real_escape_string($link_id);
$link_name = mysql_real_escape_string($link_name);
$description = mysql_real_escape_string($description);
$metadesc = mysql_real_escape_string($metadesc);
$link_created = mysql_real_escape_string($link_created);
$link_modified = mysql_real_escape_string($link_modified);
$website = mysql_real_escape_string($website);
$cost = mysql_real_escape_string($cost);
$image_name = mysql_real_escape_string($image_name);
$query = "REPLACE into jos_mt_links
(link_id, link_name, alias, link_desc, user_id, link_published,link_approved, metadesc, link_created, link_modified, website, price)
VALUES ('$link_id','$link_name','$link_name','$description','63','1','1','$metadesc','$link_created','$link_modified','$website','$cost')";
echo $link_name . " has been inserted ";
,当我从 shell 运行它时,我看到
PONS Kompaktwörterbuch Deutsch-Englisch (Android) has been inserted
I am taking an XML file and reading it into various strings, before writing to a database, however I am having difficulty with German characters.
The XML file starts off
<?xml version="1.0" encoding="UTF-8"?>
Then an example of where I am having problems is this part
<name><![CDATA[PONS Großwörterbuch Deutsch als Fremdsprache Android]]></name>
My PHP has this relevant section
$dom = new DOMDocument();
$domNode = $xmlReader->expand();
$element = $dom->appendChild($domNode);
$domString = utf8_encode($dom->saveXML($element));
$product = new SimpleXMLElement($domString);
//read in data
$arr = $product->attributes();
$link_ident = $arr["id"];
$link_id = $platform . "" . $link_ident;
$link_name = $product->name;
So $link_name becomes PONS GroÃwörterbuch Deutsch als Fremdsprache Android
I then did a
$link_name = utf8_decode($link_name);
Which when I echoed back in terminal worked fine
PONS GroÃwörterbuch Deutsch als Fremdsprache Android as is now
PONS Großwörterbuch Deutsch als Fremdsprache Android after utf8decode
However when it is written into my database it appears as:
PONS Kompaktwörterbuch Deutsch-Englisch (Android)
The collation for link_name in MysQL is utf8_general_ci
How should I be doing this to get it correctly written into my database?
This is the code I use to write to the database
$link_name = utf8_decode($link_name);
$link_id = mysql_real_escape_string($link_id);
$link_name = mysql_real_escape_string($link_name);
$description = mysql_real_escape_string($description);
$metadesc = mysql_real_escape_string($metadesc);
$link_created = mysql_real_escape_string($link_created);
$link_modified = mysql_real_escape_string($link_modified);
$website = mysql_real_escape_string($website);
$cost = mysql_real_escape_string($cost);
$image_name = mysql_real_escape_string($image_name);
$query = "REPLACE into jos_mt_links
(link_id, link_name, alias, link_desc, user_id, link_published,link_approved, metadesc, link_created, link_modified, website, price)
VALUES ('$link_id','$link_name','$link_name','$description','63','1','1','$metadesc','$link_created','$link_modified','$website','$cost')";
echo $link_name . " has been inserted ";
and when I run it from shell I see
PONS Kompaktwörterbuch Deutsch-Englisch (Android) has been inserted
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您从 XML 文件中获得了一个 UTF-8 字符串,并将其放入 UTF-8 数据库中。因此不需要进行编码或解码,只需将原始字符串推入数据库即可。确保您首先使用 mysql_set_charset('utf-8') 告诉数据库有 UTF-8 字符串即将到来。
utf8_decode
和utf8_encode
的命名具有误导性。它们仅用于在 UTF-8 和 ISO-8859-1 编码之间进行转换。调用utf8_decode
(转换为 ISO-8859-1)自然会丢失任何不适合该编码的字符。通常应该避免使用这些函数,除非有特定的地方需要使用 8859-1。当您回显字符串时,您不应该考虑终端显示的内容是确定的。终端有其自身的编码问题,尤其是在 Windows 下,可能无法正确输出每个字符。在西方 Windows 上安装系统代码页(终端将使用它来将 PHP 吐出的字节转换为字符以在屏幕上显示)将是代码页 1252,这与 ISO-8859-1 类似但不相同。这就是为什么输出 ISO-8859-1 的
utf8_decode
似乎使文本按照您的预期显示。但这没什么用。在内部,您应该对所有字符串使用 UTF-8。You've got a UTF-8 string from an XML file, and you're putting it into a UTF-8 database. So there is no encoding or decode to be done, just shove the original string into the database. Make sure you've used
mysql_set_charset('utf-8')
first to tell the database there are UTF-8 strings coming.utf8_decode
andutf8_encode
are misleadingly named. They are only for converting between UTF-8 and ISO-8859-1 encodings. Callingutf8_decode
, which converts to ISO-8859-1, will naturally lose any characters you have that don't fit in that encoding. You should generally avoid these functions unless there's a specific place where you need to be using 8859-1.You should not consider what the terminal shows when you echo a string to be definitive. The terminal has its own encoding problems and especially under Windows it is likely to be impossible to output every character properly. On a Western Windows install the system code page (which the terminal will use to turn the bytes PHP spits out into characters to display on-screen) will be code page 1252, which is similar to but not the same as ISO-8859-1. This is why
utf8_decode
, which spits out ISO-8859-1, appeared to make the text appear as you expected. But that's of little use. Internally you should be using UTF-8 for all strings.在写入数据库之前,必须使用 mb_convert_encoding 或 iconv 函数。
You must use mb_convert_encoding or iconv unction before you write into your database.