PHP 的字符集问题
我对将重音字符转换为非重音字符的 PHP 代码有疑问。一年前我有这段代码工作,但我试图让它工作但没有成功。翻译不正确。
下面是代码:
<?php
echo accentdestroyer('azeméis');
/**
*
* This function transform accent characters to non accent characters
* @param text $string
*/
function accentdestroyer($string) {
$string=strtr($string,
"()!$?: ,&+-/.ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ"
,
"-------------SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy");
return $string;
}
?>
我已经测试过以 UTF-8 保存文档,但给了我这样的内容:“azemy�is”
关于如何才能使其正常工作的一些线索?
此致,
I have a problem with a PHP code that transforms accent characters in non accent characters. I have this code working a year ago but I'm trying to get this to work but without success. The translation is not done correctly.
Here is the code:
<?php
echo accentdestroyer('azeméis');
/**
*
* This function transform accent characters to non accent characters
* @param text $string
*/
function accentdestroyer($string) {
$string=strtr($string,
"()!$?: ,&+-/.ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ"
,
"-------------SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy");
return $string;
}
?>
I have tested to save the document in UTF-8 but gives me something like this: "azemy�is"
Some clues on what can I do to get this working correctly?
Best Regards,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
更好的解决方案可能是使用 音译 这些字符自动href="http://www.php.net/manual/en/function.iconv.php" rel="nofollow noreferrer">
iconv()
。至于你的函数不起作用的原因,可能与
echo strlen('Š');
输出 2 有关。 文档明确引用单字节字符。另外,
第一个字节已匹配,但第二个字节(剩余的)不是指向有效 Unicode 字符的字节。
更新
这是一个使用 iconv() 的工作示例。
有些字符无法完全翻译,例如
¥
和Ø
,但大多数字符都可以翻译。您可以将//IGNORE
附加到输出字符集,以静默丢弃不音译的字符。您还可以使用 Unicode 正则表达式 删除所有非单词字符<代码>\pL。
A better solution may be to transliterate those characters automatically using
iconv()
.As for the reason your function doesn't work, it may have something to do with the fact that
echo strlen('Š');
outputs 2. The documentation explicitly refers to single byte characters.Also,
So the first byte has been matched but the second one (leftover) isn't a byte pointing to a valid Unicode character.
Update
Here is a workign example using
iconv()
.Some characters didn't quite translate, such as
¥
andØ
, but most did. You can append//IGNORE
to the output character set to silently discard the ones which don't transliterate.You could also drop all non word characters too using a Unicode regex with
\pL
.