“混乱的字符”列表;以 utf8 格式
我的一个客户有一个网站,该网站由于托管公司在整个数据库上强制使用字符集而完全搞砸了。我们之前在角色设置上遇到过麻烦,但现在它只是一部戏剧!
到目前为止,我已将 charset=utf-8 添加到页面内容类型,并将 mysql 连接的字符集设置为 utf8。现在是时候替换所有角色了。到目前为止,我发现的是:
ö = ö
ë = ë
é = é
数据库内的数据正在更新,如下所示:
UPDATE table SET `fieldname` = REPLACE(`fieldname`, 'ö', 'ö');
现在我只需要找到混乱的所有字符的完整列表。我尝试使用 MySQL 查询搜索 field LIKE '%à%'
但这会返回数据库内的所有记录。
谷歌也只是在其他人遇到麻烦的一些主题中显示几个字符(主要是上面的 3 个),但是似乎没有这些字符(或至少是最常见的)的完整列表,我可以用它来查找并替换我的客户的所有数据。
如果有人可能知道这样的位置或能够完成我的列表,作为回报,我将创建一个包含这些字符的页面来帮助其他人(当然,除非已经有一个我不知道的列表)。
// 编辑
:
它适用于最常见的欧洲字符,例如 é è ë、á à ä、ö ó ò、ï、ü,或许还有ringel-S(德语双 S)。对于像 ñ 或 ã 这样的跨越符号来说并不是那么多,但如果它们出现在某个列表中,我们也会非常感激。
// 编辑 2
:
我使用本文第一部分中的 2 个 ALTER 查询更新了 MySQL 数据库和表:http://developer.loftdigital.com/blog/php-utf -8-cheatsheet
。到目前为止,我没有使用 mb_ 函数,也没有像看起来那样进行任何 MB 配置。
文件中的标头全部设置为 utf-8(不过,我仍然需要检查一些 ajax 脚本的标头,不确定是否需要,但这样做不会有害)。并且文件全部保存为UTF8无BOM。 PHPFreakMailer 也通过将字符集设置为 utf-8 进行更新。
够糟糕的
,我仍然有这些奇怪的字符。我没想到他们会自己消失,但至少值得这样希望:-)那么我应该采取的最后一步是什么?继续使用 REPLACE 查询并手动更改所有奇怪的字符?
提前致谢!
one of my clients has a website which has been totally messed up by the hosting companie forcing a characterset on the complete database. We've had troubles before with character sets but now it's just straight forward a drama!
So far I've added the charset=utf-8 to the page content type and set the charset for the mysql connection to utf8. And now it's time to replace all characters. So far what I've found is:
ö = ö
ë = ë
é = é
The data inside the database is being updated like so:
UPDATE table SET `fieldname` = REPLACE(`fieldname`, 'ö', 'ö');
Now I just need to find a complete list of alle characters that are messed up. I tried a MySQL query searching for field LIKE '%Ã%'
but this returns me all records inside the database.
Google also just displays a couple of characters (mostly the 3 above) in some topics of other people that have had troubles, however it seems there's nowhere a complete list of these characters (or at least the most common) which I can use to find and replace all data for my client.
If anyone perhaps knows such location or is able to complete my list I will, in return, create a page containing these characters to help others (unless there's a list already which I'm not aware of somewhere ofcourse).
// EDIT
:
it would be for the most common european characters such as é è ë, á à ä, ö ó ò, ï, ü and perhaps the ringel-S (German double S). Not so much for the spaning signs like ñ or ã, but if they are in a list somewhere that would be much appreciated aswel.
// EDIT 2
:
I updated the MySQL database and tables using the 2 ALTER queries from the 1st part of this article: http://developer.loftdigital.com/blog/php-utf-8-cheatsheet
. I DID NOT make use of the mb_ functions so far and didn't do any MB configuration as it seems.
The headers are all set to utf-8 in the files (I still have to check the headers for some ajax scripts tho, not sure if that's needed but it won't be harmfull doing so). And the files are all saved as UTF8 without BOM. Also PHPFreakMailer is updated by setting the charset to utf-8.
Bad enough
, I'm still having these weird characters. I wasn't thinking they'd go away by theirself, but at least it was worth hoping so :-) So what's the final step I should take? Continuïng using the REPLACE query and changing all wierd characters manually?
Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这有点疯狂;您认为“ö”属于什么字符集?
看起来这实际上是一个正确的 UTF-8 序列(因为它是两个字节),您只是将其显示为 ISO-8559-1。
编辑:
根据您的评论,我认为发生了以下情况:
我认为(但实际上不是100%确定)正确的UTF-8二进制序列存储在数据库。但由于该表被标记为ISO-8559-1,并且您请求自动转换字符集。因此它认为它是 ISO-8559-1(看起来像 ¶),但随后尝试将其转换为 UTF-8。
如果 strlen('?') 是 4,而不是 2,您应该能够验证这一点。如果长度确实是 2,则您的浏览器编码会以某种方式搞砸。
要解决此问题,请不要将 MySQL 设置为对字符进行编码。
选项 2
数据也可以在表中进行“双重编码”。要检查这一点,只需检查数据库中的字符串长度即可。如果“?”的长度为 4 个字节,这就是问题所在。
在这种情况下,我的建议是不要尝试制作一个大的“混乱的角色”地图。您应该能够简单地“utf8_decode”该字符串。通常这个函数会输出一个 ISO-8559-1 字符串,但在你的情况下......它应该是原始的有效 UTF-8 字符串。
我希望这能起作用!
Edit2
好吧,我相信发生的事情是选项 2。用简单的(php)术语来说:
所以一个 utf8_decode() 应该足够了。
不过,在运行迁移脚本之前请先测试一下:)
This is a bit crazy; what character set do you think "ö" is in?
It looks like that's actually a correct UTF-8 sequence (since it's two bytes), you're just displaying it as ISO-8559-1.
Edit:
Based on your comment I think the following is going on:
I think (but really not 100% sure) that the correct UTF-8 binary sequence is stored in the database. But since the table is marked as ISO-8559-1, and you requested to automatically convert character set. So it thinks it's ISO-8559-1 (which looks like ö), but then tries to convert that to UTF-8.
You should be able to verify this, if strlen('ö') is 4, and not 2. If the length is indeed 2, your browser encoding somehow screws up.
To fix this, don't set the MySQL to encode the characters.
Option 2
The data could also be 'double encoded' in the table. To check this, simply also check the string length on the database. If the 'ö' is 4 bytes long, this is the issue.
My advice in this case is to not try to make a big 'messed up character'-map. You should simply be able to 'utf8_decode' the string. Normally this function will output a ISO-8559-1 string, but in your case.. it should turn out to be the original valid UTF-8 string.
I hope this works!
Edit2
Ok so effectively what I believe has happened is Option 2. To put it in simple (php) terms:
So one utf8_decode() should be enough.
Do test this before you run your migration scripts though :)
如果他们强制更改字符,为什么您的数据库没有转换?您的表仍然是旧的字符集吗(请参阅您的 phpMyAdmin 有关表信息的信息)。
如果数据出现在您的 phpMyAdmin 中或仅出现在您的网页上,是否是错误的? ->您的名称和排序规则以及标题和文件类型(安全文件为 utf-8)都应该更改。
或者尝试:
仅当 MySQL 中没有剩余选项时我才会开始替换字符。
If they forced a character change, why is your database not converted? Are your tables still the old character set (see your phpMyAdmin on table information).
Is the data wrong if it shows up in your phpMyAdmin or only on your webpage? -> your names and collation should change, as well as headers and filetype (safe file as utf-8).
Or try:
I would start replacing characters only if there are no options from within MySQL left.
既然你用“php”标记了这个问题,我假设你用 PHP 读取了数据库及其值?如果是这样,请查看 mb_convert_encoding 如果您不再控制数据库。
更好的解决方案是修复数据和表字符集之间的不一致。备份数据库(以防万一),并将所有表和列更改为 UTF-8。 注意:使用 MySQL 时,更改表的字符集不够,您必须针对每列执行此操作。
Since you've tagged this question with "php", I assume you read the database and it's values with PHP? If so, please have a look at mb_convert_encoding if you no longer have control over the database.
The better solution would be to fix the inconsistency between the data and the tables characterset. Backup the database (just in case), and alter all tables and columns to UTF-8. Note: when using MySQL, it is not enough to alter the table's charset, you'll have to do this per column.
为什么不使用:
ä = ä
,ö = ö
,...在 php 中执行
htmlentities();
,它将把所有特殊字符转换为实体。我认为这是最简单的方法。
Why don't you use:
ä = ä
,ö = ö
,...Do
htmlentities();
in php and it will convert all special characters into Entitys.I think this would be the easiest way to do it.