数据库文本中的奇怪字符:Ã、Ã、¢、â‚ €,
我不确定这第一次发生是什么时候。
我有一个新的直运联属网站,并从批发商处收到产品目录的导出副本。我将其格式化并导入 Prestashop 1.4.4。
网站的前端在产品文本中包含奇怪的字符组合: à、 à、 、 等。它们出现在常见字符的位置,例如,-:等。
这些字符出现在大约 40% 的数据库表,而不仅仅是像 ps_product_lang 这样的产品特定表。
另一个网站线程说当数据库连接字符串使用时也会出现同样的问题错误的字符编码类型。
在/config/setting.inc中,没有提到任何字符编码字符串,只是MySQL引擎,它设置为InnoDB,这与我在PHPMyAdmin中看到的相符。
我导出了 ps_product_lang,用正确的字符替换了这些字符的所有实例,以 UTF-8 格式保存了 CSV 文件,然后使用 PHPMyAdmin 重新导入它们,指定 UTF-8 作为语言。
然而,在 PHPMyAdmin 中进行新搜索后,我现在 ps_product_lang 中这些不良字符的实例数量比开始时的数量大约是原来的 10 倍。
如果问题就像在数据库连接字符串中指定正确的语言属性一样简单,那么我应该在哪里/如何设置它,以及做什么?
顺便说一句,我尝试在 此线程,但问题仍然存在:
SET NAMES utf8
更新:PHPMyAdmin 说:
MySQL 字符集:UTF-8 Unicode (utf8)
这与我在上次导入文件中使用的字符集相同,这导致了更多的字符损坏。在导入过程中指定 UTF-8 作为导入文件的字符集。
UPDATE2
这是一个示例:
人们真正过着不受束缚的生活… � 在线购买和租赁电影、下载软件以及 在网络上共享和存储文件。
UPDATE3
显示字符集:
- character_set_client utf8character_set_connection
- utf8character_set_databaselatin1character_set_filesystembinarycharacter_set_resultsutf8character_set_serverlatin1character_set_systemutf8
- 所以
- )
- ,
- 我
- 在 PHPMyAdmin 中运行了一个 SQL 命令来
也许我的数据库需要转换(或删除并重新创建 为 UTF-8。如果 MySQL 服务器是 latin1,这会造成问题吗?
MySQL 能否处理以 UTF8 提供内容但将其存储为 latin1 的翻译?我认为不可能,因为 UTF8 是 latin1 的超集。我的网络托管支持在 48 小时内尚未回复。对他们来说可能太难了。
I'm not certain when this first occured.
I have a new drop-shipping affiliate website, and receive an exported copy of the product catalog from the wholesaler. I format and import this into Prestashop 1.4.4.
The front end of the website contains combinations of strange characters inside product text: Ã, Ã, ¢, â‚ etc. They appear in place of common characters like , - : etc.
These characters are present in about 40% of the database tables, not just product specific tables like ps_product_lang.
Another website thread says this same problem occurs when the database connection string uses an incorrect character encoding type.
In /config/setting.inc, there is no character encoding string mentioned, just the MySQL Engine, which is set to InnoDB, which matches what I see in PHPMyAdmin.
I exported ps_product_lang, replaced all instances of these characters with correct characters, saved the CSV file in UTF-8 format, and reimported them using PHPMyAdmin, specifying UTF-8 as the language.
However, after doing a new search in PHPMyAdmin, I now have about 10 times as many instances of these bad characters in ps_product_lang than I started with.
If the problem is as simple as specifying the correct language attribute in the database connection string, where/how do I set this, and what to?
Incidently, I tried running this command in PHPMyAdmin mentioned in this thread, but the problem remains:
SET NAMES utf8
UPDATE: PHPMyAdmin says:
MySQL charset: UTF-8 Unicode (utf8)
This is the same character set I used in the last import file, which caused more character corruptions. UTF-8 was specified as the charset of the import file during the import process.
UPDATE2
Here is a sample:
people are truly living untetheredâ€ïâ€Â
Ã‚ï† buying and renting movies online, downloading software, and
sharing and storing files on the web.
UPDATE3
I ran an SQL command in PHPMyAdmin to display the character sets:
- character_set_client utf8
- character_set_connection utf8
- character_set_database latin1
- character_set_filesystem binary
- character_set_results utf8
- character_set_server latin1
- character_set_system utf8
So, perhaps my database needs to be converted (or deleted and recreated) to UTF-8. Could this pose a problem if the MySQL server is latin1?
Can MySQL handle the translation of serving content as UTF8 but storing it as latin1? I don't think it can, as UTF8 is a superset of latin1. My web hosting support has not replied in 48 hours. Might be too hard for them.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果表的字符集与其内容相同,请尝试使用
mysql_set_charset('UTF8', $link_identifier)
。请注意,MySQL 使用UTF8
来指定 UTF-8 编码,而不是更常见的UTF-8
。也请查看我对类似问题的其他答案。
If the charset of the tables is the same as it's content try to use
mysql_set_charset('UTF8', $link_identifier)
. Note that MySQL usesUTF8
to specify the UTF-8 encoding instead ofUTF-8
which is more common.Check my other answer on a similar question too.
这肯定是编码问题。您的数据库和网站中的编码不同,这一事实就是问题的原因。此外,如果您运行该命令,则必须更改表中已有的记录以将这些字符转换为 UTF-8。
更新:根据您上次的评论,问题的核心是您的数据库和数据源(CSV文件)使用不同的编码。因此,您可以将数据库转换为 UTF-8,或者至少,当您获取 CSV 中的数据时,必须将它们从 UTF-8 转换为 latin1。
您可以按照本文进行转换:
This is surely an encoding problem. You have a different encoding in your database and in your website and this fact is the cause of the problem. Also if you ran that command you have to change the records that are already in your tables to convert those character in UTF-8.
Update: Based on your last comment, the core of the problem is that you have a database and a data source (the CSV file) which use different encoding. Hence you can convert your database in UTF-8 or, at least, when you get the data that are in the CSV, you have to convert them from UTF-8 to latin1.
You can do the convertion following this articles:
这似乎是 UTF-8 编码问题,可能是由数据库文件内容的双重 UTF8 编码引起的。
这种情况可能是由于诸如选择或未选择的字符集(例如创建数据库备份文件时)以及保存数据库文件的文件格式和编码等因素而发生的。
我在以下场景中看到了这些奇怪的 UTF-8 字符(描述可能不完全准确,因为我无法再访问相关数据库):
查看文件内容:
因此,问题在于“错误”(UTF8 编码两次)utf-8 需要转换回“正确”utf-8(仅 UTF8 编码一次)。
尝试在 PHP 中修复此问题有点具有挑战性:
utf8_decode() 无法处理字符。
iconv() 失败并显示“注意: iconv():在输入字符串中检测到非法字符”。
在这种情况下,另一个 良好且可能的解决方案 也会默默地失败
mb_convert_encoding( )默默地: #
尝试通过以下方式修复 MySQL 中的编码转换 MySQL 数据库字符集和排序到 UTF-8 失败:
我看到有几种方法可以解决此问题。
首先是使用正确的编码进行备份(编码需要与实际的数据库和表编码相匹配)。您只需在文本编辑器中打开生成的 SQL 文件即可验证编码。
另一种是用单UTF8编码字符替换双UTF8编码字符。这可以在文本编辑器中手动完成。为了帮助完成此过程,您可以从尝试 UTF-8 编码调试图表< 中手动选择不正确的字符/a>(可能是替换 5-10 个错误的问题)。
最后,脚本可以协助该过程:
This appears to be a UTF-8 encoding issue that may have been caused by a double-UTF8-encoding of the database file contents.
This situation could happen due to factors such as the character set that was or was not selected (for instance when a database backup file was created) and the file format and encoding database file was saved with.
I have seen these strange UTF-8 characters in the following scenario (the description may not be entirely accurate as I no longer have access to the database in question):
Looking into the file contents:
So, the issue is that "false" (UTF8-encoded twice) utf-8 needs to be converted back into "correct" utf-8 (only UTF8-encoded once).
Trying to fix this in PHP turns out to be a bit challenging:
utf8_decode() is not able to process the characters.
iconv() fails with "Notice: iconv(): Detected an illegal character in input string".
Another fine and possible solution fails silently too in this scenario
mb_convert_encoding() silently: #
Trying to fix the encoding in MySQL by converting the MySQL database characterset and collation to UTF-8 was unsuccessfully:
I see a couple of ways to resolve this issue.
The first is to make a backup with correct encoding (the encoding needs to match the actual database and table encoding). You can verify the encoding by simply opening the resulting SQL file in a text editor.
The other is to replace double-UTF8-encoded characters with single-UTF8-encoded characters. This can be done manually in a text editor. To assist in this process, you can manually pick incorrect characters from Try UTF-8 Encoding Debugging Chart (it may be a matter of replacing 5-10 errors).
Finally, a script can assist in the process:
我今天遇到了一个非常相似的问题:mysqldump 将我的 utf-8 基本编码 utf-8 变音符号转储为两个 latin1 字符,尽管文件本身是常规 utf8。
例如:“é”被编码为两个字符“é”。这两个字符对应于字母的 utf8 两个字节编码,但应将其解释为单个字符。
为了解决问题并在另一台服务器上正确导入数据库,我必须使用 ftfy(代表“为您修复文本”)转换文件。(https://github.com/LuminosoInsight/python-ftfy) python 库。该库完全符合我的预期:将错误编码的 utf-8 转换为正确编码的utf-8。
é”变成了“é”,
ftfy附带了一个命令行脚本,但它转换了文件,因此无法将其导入回mysql。
例如:这个latin1组合“ 脚本来做到这一点:
I encountered today quite a similar problem : mysqldump dumped my utf-8 base encoding utf-8 diacritic characters as two latin1 characters, although the file itself is regular utf8.
For example : "é" was encoded as two characters "é". These two characters correspond to the utf8 two bytes encoding of the letter but it should be interpreted as a single character.
To solve the problem and correctly import the database on another server, I had to convert the file using the ftfy (stands for "Fixes Text For You). (https://github.com/LuminosoInsight/python-ftfy) python library. The library does exactly what I expect : transform bad encoded utf-8 to correctly encoded utf-8.
For example : This latin1 combination "é" is turned into an "é".
ftfy comes with a command line script but it transforms the file so it can not be imported back into mysql.
I wrote a python3 script to do the trick :
应用这两件事。
您需要将数据库的字符集设置为
utf8
。您需要在与数据库建立连接的文件中调用
mysql_set_charset('utf8')
,并在选择数据库后立即使用mysql_select_db
mysql_set_charset
。这将允许您以任何语言正确添加和检索数据。Apply these two things.
You need to set the character set of your database to be
utf8
.You need to call the
mysql_set_charset('utf8')
in the file where you made the connection with the database and right after the selection of database likemysql_select_db
use themysql_set_charset
. That will allow you to add and retrieve data properly in whatever the language.该错误通常是在创建 CSV 时引入的。尝试使用 Linux 将 CSV 保存为 TextCSV。 Ubuntu 中的 Libre Office 可以强制编码为 UTF-8,对我有用。
我浪费了很多时间在 Mac OS 上尝试这个。 Linux是关键。我在Ubuntu上测试过。
祝你好运
The error usually gets introduced while creation of CSV. Try using Linux for saving the CSV as a TextCSV. Libre Office in Ubuntu can enforce the encoding to be UTF-8, worked for me.
I wasted a lot of time trying this on Mac OS. Linux is the key. I've tested on Ubuntu.
Good Luck