如何使用 strtr() 翻译多字节/重音/变音字符?

发布于 2024-08-31 11:08:37 字数 360 浏览 12 评论 0原文

有人有 strtr() 函数的多字节变体吗?

所需用法示例:

Example:
$from = 'ľľščťžýáíŕďňäô'; // these chars are in UTF-8
$to   = 'llsctzyairdnao';

// input - in UTF-8
$str  = 'Kŕdeľ ďatľov učí koňa žrať kôru.';
$str  = mb_strtr( $str, $from, $to );

// output - str without diacritic
// $str = 'Krdel datlov uci kona zrat koru.';

Does anyone have a multibyte variant of the strtr() function?

Example of desired usage:

Example:
$from = 'ľľščťžýáíŕďňäô'; // these chars are in UTF-8
$to   = 'llsctzyairdnao';

// input - in UTF-8
$str  = 'Kŕdeľ ďatľov učí koňa žrať kôru.';
$str  = mb_strtr( $str, $from, $to );

// output - str without diacritic
// $str = 'Krdel datlov uci kona zrat koru.';

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

枯寂 2024-09-07 11:08:40
function mb_strtr($str,$map,$enc){
$out="";
$strLn=mb_strlen($str,$enc);
$maxKeyLn=1;
foreach($map as $key=>$val){
    $keyLn=mb_strlen($key,$enc);
    if($keyLn>$maxKeyLn){
        $maxKeyLn=$keyLn;
    }
}
for($offset=0; $offset<$strLn; ){
    for($ln=$maxKeyLn; $ln>=1; $ln--){
        $cmp=mb_substr($str,$offset,$ln,$enc);
        if(isset($map[$cmp])){
            $out.=$map[$cmp];
            $offset+=$ln;
            continue 2;
        }
    }
    $out.=mb_substr($str,$offset,1,$enc);
    $offset++;
}
return $out;
}
function mb_strtr($str,$map,$enc){
$out="";
$strLn=mb_strlen($str,$enc);
$maxKeyLn=1;
foreach($map as $key=>$val){
    $keyLn=mb_strlen($key,$enc);
    if($keyLn>$maxKeyLn){
        $maxKeyLn=$keyLn;
    }
}
for($offset=0; $offset<$strLn; ){
    for($ln=$maxKeyLn; $ln>=1; $ln--){
        $cmp=mb_substr($str,$offset,$ln,$enc);
        if(isset($map[$cmp])){
            $out.=$map[$cmp];
            $offset+=$ln;
            continue 2;
        }
    }
    $out.=mb_substr($str,$offset,1,$enc);
    $offset++;
}
return $out;
}
败给现实 2024-09-07 11:08:40

可能使用 str_replace 是一个很好的解决方案。另一种选择:

<?php
header('Content-Type: text/plain;charset=utf-8');

function my_strtr($inputStr, $from, $to, $encoding = 'UTF-8') {
        $inputStrLength = mb_strlen($inputStr, $encoding);

        $translated = '';

        for($i = 0; $i < $inputStrLength; $i++) {
                $currentChar = mb_substr($inputStr, $i, 1, $encoding);

                $translatedCharPos = mb_strpos($from, $currentChar, 0, $encoding);

                if($translatedCharPos === false) {
                        $translated .= $currentChar;
                }
                else {
                        $translated .= mb_substr($to, $translatedCharPos, 1, $encoding);
                }
        }

        return $translated;
}


$from = 'ľľščťžýáíŕďňä'; // these chars are in UTF-8
$to   = 'llsctzyairdna';

// input - in UTF-8
$str  = 'Kŕdeľ ďatľov učí koňa žrať kôru.';

print 'Original: ';
print chr(10);
print $str;

print chr(10);
print chr(10);

print 'Tranlated: ';
print chr(10);
print my_strtr( $str, $from, $to);

使用 PHP 5.2 在我的机器上打印:

Original: 
Kŕdeľ ďatľov učí koňa žrať kôru.

Tranlated: 
Krdel datlov uci kona zrat kôru.

Probably using str_replace is a good solution. An alternative:

<?php
header('Content-Type: text/plain;charset=utf-8');

function my_strtr($inputStr, $from, $to, $encoding = 'UTF-8') {
        $inputStrLength = mb_strlen($inputStr, $encoding);

        $translated = '';

        for($i = 0; $i < $inputStrLength; $i++) {
                $currentChar = mb_substr($inputStr, $i, 1, $encoding);

                $translatedCharPos = mb_strpos($from, $currentChar, 0, $encoding);

                if($translatedCharPos === false) {
                        $translated .= $currentChar;
                }
                else {
                        $translated .= mb_substr($to, $translatedCharPos, 1, $encoding);
                }
        }

        return $translated;
}


$from = 'ľľščťžýáíŕďňä'; // these chars are in UTF-8
$to   = 'llsctzyairdna';

// input - in UTF-8
$str  = 'Kŕdeľ ďatľov učí koňa žrať kôru.';

print 'Original: ';
print chr(10);
print $str;

print chr(10);
print chr(10);

print 'Tranlated: ';
print chr(10);
print my_strtr( $str, $from, $to);

Prints on my machine using PHP 5.2:

Original: 
Kŕdeľ ďatľov učí koňa žrať kôru.

Tranlated: 
Krdel datlov uci kona zrat kôru.
燕归巢 2024-09-07 11:08:40

strtr() 有两个有效签名用于接收其参数.

您实现 strtr() 的方式执行逐字节转换——这显然不适合您的多字节字符。

$from = 'ľľščťžýáíŕďňäô'; // these chars are in UTF-8
$to   = 'llsctzyairdnao';

$str  = 'Kŕdeľ ďatľov učí koňa žrať kôru.';
echo strtr($str, $from, $to);
// Kd�deyn y�atynov uyaa� kod�a dnradr ka�ru.

正确的实现是向函数提供一个要翻译的关联字符数组——这是多字节安全的方式。 (演示)

$trans = [
    'ľ' => 'l',
    'š' => 's',
    'č' => 'c',
    'ť' => 't',
    'ž' => 'z',
    'ý' => 'y',
    'á' => 'a',
    'í' => 'i',
    'ŕ' => 'r',
    'ď' => 'd',
    'ň' => 'n',
    'ä' => 'a',
    'ô' => 'o',
];
echo strtr($str, $trans);
// Krdel datlov uci kona zrat koru.

还应该注意的是,已经开发了一些库和本机函数来处理此类任务。

strtr() has two valid signatures for receiving its parameters.

The way that you have implemented strtr() performs byte-by-byte translations -- this is obviously inappropriate for your multibyte characters.

$from = 'ľľščťžýáíŕďňäô'; // these chars are in UTF-8
$to   = 'llsctzyairdnao';

$str  = 'Kŕdeľ ďatľov učí koňa žrať kôru.';
echo strtr($str, $from, $to);
// Kd�deyn y�atynov uyaa� kod�a dnradr ka�ru.

The correct implementation is to feed the function an associative array of characters to translate -- this is the multibyte-safe way. (Demo)

$trans = [
    'ľ' => 'l',
    'š' => 's',
    'č' => 'c',
    'ť' => 't',
    'ž' => 'z',
    'ý' => 'y',
    'á' => 'a',
    'í' => 'i',
    'ŕ' => 'r',
    'ď' => 'd',
    'ň' => 'n',
    'ä' => 'a',
    'ô' => 'o',
];
echo strtr($str, $trans);
// Krdel datlov uci kona zrat koru.

It should also be noted that there are libraries and native functions developed to handle such a task.

木有鱼丸 2024-09-07 11:08:39

我相信 strtr 是多字节安全,无论哪种方式,因为 str_replace 是多字节安全的,您可以将其包装:

function mb_strtr($str, $from, $to)
{
  return str_replace(mb_str_split($from), mb_str_split($to), $str);
}

由于没有 mb_str_split 函数,您还需要编写您自己的(使用 mb_substrmb_strlen),或者您可以只使用 PHP UTF-8 实现(略有更改):

function mb_str_split($str) {
    return preg_split('~~u', $str, null, PREG_SPLIT_NO_EMPTY);;

}

但是,如果您正在寻找一个函数来删除字符串中的所有(拉丁语?)重音符号,您可能会发现以下函数很有用:

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8'));
}

echo Unaccent('ľľščťžýáíŕďňä'); // llsctzyairdna
echo Unaccent('Iñtërnâtiônàlizætiøn'); // Internationalizaetion

I believe strtr is multi-byte safe, either way since str_replace is multi-byte safe you could wrap it:

function mb_strtr($str, $from, $to)
{
  return str_replace(mb_str_split($from), mb_str_split($to), $str);
}

Since there is no mb_str_split function you also need to write your own (using mb_substr and mb_strlen), or you could just use the PHP UTF-8 implementation (changed slightly):

function mb_str_split($str) {
    return preg_split('~~u', $str, null, PREG_SPLIT_NO_EMPTY);;

}

However if you're looking for a function to remove all (latin?) accentuations from a string you might find the following function useful:

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8'));
}

echo Unaccent('ľľščťžýáíŕďňä'); // llsctzyairdna
echo Unaccent('Iñtërnâtiônàlizætiøn'); // Internationalizaetion
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文