将变音符号替换为“等效”字符PHP 中的 ASCII 码?

发布于 2024-08-29 07:51:06 字数 488 浏览 12 评论 0原文

相关问题:

  1. 如何替换java字符串中的字符?
  2. 如何将特殊字符替换为它们在 C# 中的等效项(例如“ á ”代表“ a”)?

正如上面的问题,我正在寻找一种可靠、稳健的方法来使用 PHP 将任何 unicode 字符减少为接近等效的 ASCII。我真的想避免滚动我自己的查找表。

例如(从第一个引用的问题中窃取):Gračišće 变为 Gracisce

Related questions:

  1. How to replace characters in a java String?
  2. How to replace special characters with their equivalent (such as " á " for " a") in C#?

As in the questions above, I'm looking for a reliable, robust way to reduce any unicode character to near-equivalent ASCII using PHP. I really want to avoid rolling my own look up table.

For example (stolen from 1st referenced question): Gračišće becomes Gracisce

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

不美如何 2024-09-05 07:51:06

iconv 模块可以做到这一点,更具体地说,iconv() 函数:

$str = iconv('Windows-1252', 'ASCII//TRANSLIT//IGNORE', "Gracišce");
echo $str;
//outputs "Gracisce"

iconv 的主要麻烦是你只需要观察你的编码,但这绝对是正确的完成这项工作的工具(由于我使用的文本编辑器的限制,我使用“Windows-1252”作为示例;)您肯定想要使用的 iconv 功能是 //TRANSLIT标志,它告诉 iconv 将任何没有 ASCII 匹配的字符音译为最接近的近似值。

The iconv module can do this, more specifically, the iconv() function:

$str = iconv('Windows-1252', 'ASCII//TRANSLIT//IGNORE', "Gracišce");
echo $str;
//outputs "Gracisce"

The main hassle with iconv is that you just have to watch your encodings, but it's definitely the right tool for the job (I used 'Windows-1252' for the example due to limitations of the text editor I was working with ;) The feature of iconv that you definitely want to use is the //TRANSLIT flag, which tells iconv to transliterate any characters that don't have an ASCII match into the closest approximation.

绝對不後悔。 2024-09-05 07:51:06

我根据@zombat 的答案找到了另一个解决方案。

他的回答的问题是我得到:

Notice: iconv() [function.iconv]: Wrong charset, conversion from `UTF-8' to `ASCII//TRANSLIT//IGNORE' is not allowed in D:\www\phpcommand.php(11) : eval()'d code on line 3

从函数中删除 //IGNORE 后,我得到:

Gr'a'e~a~o^O"ucisce

因此, š 字符被正确翻译,但另一个角色不是。

对我有用的解决方案是 preg_replace (删除除 [a-zA-Z0-9] 之外的所有内容 - 包括空格)和 @zombat 的解决方案:

preg_replace('/[^a-zA-Z0-9.]/','',iconv('UTF-8', 'ASCII//TRANSLIT', "GráéãõÔücišce"));

输出:

GraeaoOucisce

I found another solution, based on @zombat's answer.

The issue with his answer was that I was getting:

Notice: iconv() [function.iconv]: Wrong charset, conversion from `UTF-8' to `ASCII//TRANSLIT//IGNORE' is not allowed in D:\www\phpcommand.php(11) : eval()'d code on line 3

And after removing //IGNORE from the function, I got:

Gr'a'e~a~o^O"ucisce

So, the š character was translated correctly, but the other characters weren't.

The solution that worked for me is a mix between preg_replace (to remove everything but [a-zA-Z0-9] - including spaces) and @zombat's solution:

preg_replace('/[^a-zA-Z0-9.]/','',iconv('UTF-8', 'ASCII//TRANSLIT', "GráéãõÔücišce"));

Output:

GraeaoOucisce
染年凉城似染瑾 2024-09-05 07:51:06

我的解决方案是创建两个字符串 - 第一个包含不需要的字母,第二个包含将替换第一个的字母。

$from = 'čšć';
$to   = 'csc';
$text = 'Gračišće';

$result = str_replace(str_split($from), str_split($to), $text);

My solution is to create two strings - first with not wanted letters and second with letters that will replace firsts.

$from = 'čšć';
$to   = 'csc';
$text = 'Gračišće';

$result = str_replace(str_split($from), str_split($to), $text);
烟酉 2024-09-05 07:51:06

试试这个:

function normal_chars($string)
{
    $string = htmlentities($string, ENT_QUOTES, 'UTF-8');
    $string = preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', $string);
    $string = preg_replace(array('~[^0-9a-z]~i', '~-+~'), ' ', $string);
    return trim($string);
}

Examples:

echo normal_chars('Álix----_Ãxel!?!?'); // Alix Axel
echo normal_chars('áéíóúÁÉÍÓÚ'); // aeiouAEIOU
echo normal_chars('üÿÄËÏÖÜŸåÅ'); // uyAEIOUYaA

根据此线程中选定的答案:PHP 中的 URL 友好用户名?

Try this:

function normal_chars($string)
{
    $string = htmlentities($string, ENT_QUOTES, 'UTF-8');
    $string = preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', $string);
    $string = preg_replace(array('~[^0-9a-z]~i', '~-+~'), ' ', $string);
    return trim($string);
}

Examples:

echo normal_chars('Álix----_Ãxel!?!?'); // Alix Axel
echo normal_chars('áéíóúÁÉÍÓÚ'); // aeiouAEIOU
echo normal_chars('üÿÄËÏÖÜŸåÅ'); // uyAEIOUYaA

Based on the selected answer in this thread: URL Friendly Username in PHP?

浅唱ヾ落雨殇 2024-09-05 07:51:06

你也应该尝试:

transliterator_transliterate('Any-Latin; Latin-ASCII; Lower()', "ÀÖØöøįĴőŔžǍǰǴǵǸțȞȟȤȳɃɆɏ");

//Will output
aooooijorzajggnthhzybey

我从这里找到了这个:
https://www.php.net/manual/en/transliterator。音译.php#111939

You should also try:

transliterator_transliterate('Any-Latin; Latin-ASCII; Lower()', "ÀÖØöøįĴőŔžǍǰǴǵǸțȞȟȤȳɃɆɏ");

//Will output
aooooijorzajggnthhzybey

I found this from here:
https://www.php.net/manual/en/transliterator.transliterate.php#111939

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文