preg_replace 为西里尔字符

发布于 2024-12-09 09:11:06 字数 566 浏览 0 评论 0 原文

我想用 null 替换这些字符 [^a-zа-з0-9_]，但当它是多字节字符串时我无法做到这一点。

我尝试使用 mb_*、iconv、PCRE、mb_eregi_replace 和 u 修饰符（用于 PCRE），但没有一个能正常工作。

mb_eregi_replace 可以工作，但它只输出正确的 utf8 字符串，但当 preg_replace 使用相同的正则表达式时，它不会替换字符。

这是我的代码，适用于 unicode，但它不会替换文本。

function _data($data)
{
  mb_regex_encoding('UTF-8');
  return mb_eregi_replace('/[^a-zа-з0-9_]+/', '', $data);
}

var_dump(namespace\_data('Текст Removethis- and this _#$)( and also this $*@&$'));

结果是使用特殊字符（#_$..），当它应该替换它们时，如果我将函数更改为 preg_replace （并且没有 unicode），它应该替换它们。

原文

I want to replace these chars [^a-zа-з0-9_] with null, but I can't do it when its multibyte string.

I tried with mb_*, iconv, PCRE, mb_eregi_replace and u modifier (for PCRE), but none of them worked well.

The mb_eregi_replace works, but it only outputs the correct utf8 string, but it doesn't replace the characters, when preg_replace works with the same regex..

Here is my code that works with unicode, but it doesn't replace text.

function _data($data)
{
  mb_regex_encoding('UTF-8');
  return mb_eregi_replace('/[^a-zа-з0-9_]+/', '', $data);
}

var_dump(namespace\_data('Текст Removethis- and this _#$)( and also this $*@&
and the result is with the special chars (#_$..) when it should replace them, if I change the function to preg_replace (and no unicode) it should replace them.
));

and the result is with the special chars (#_$..) when it should replace them, if I change the function to preg_replace (and no unicode) it should replace them.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

最后的乘客 2024-12-16 09:11:06

只要您的输入字符串是 UTF-8 编码的（测试是否不是或将其重新编码为 UTF-8），如果您在 preg_replace href="https://www.php.net/manual/en/reference.pcre.pattern.modifiers.php#:%7E:text=u%20(PCRE_UTF8)，将%20视为%20无效。" rel="nofollow noreferrer">u (PCRE_UTF8) 修饰符（结尾处的小写 U）：

function _data($data)
{ 
  return preg_replace('/[^\w_]+/u', '', $data);
}

var_dump(namespace\_data('Текст Removethis- and this _#$)( and also this $*@&
 Demo

 \w = 任何单词字符
u （最后）= 启用 UTF- 8 为正则表达式。

));

Demo

\w = 任何单词字符
u （最后）= 启用 UTF- 8 为正则表达式。

As long as your input string is UTF-8 encoded (test if not or re-encode it to UTF-8), you can safely use preg_replace if you use the correct regular expression with the u (PCRE_UTF8) modifier (the is the lower-case U at the end):

function _data($data)
{ 
  return preg_replace('/[^\w_]+/u', '', $data);
}

var_dump(namespace\_data('Текст Removethis- and this _#$)( and also this $*@&
Demo

\w = any word character
u (at then end) = enable UTF-8 for the regex.

));

Demo