我想用 null 替换这些字符 [^a-zа-з0-9_],但当它是多字节字符串时我无法做到这一点。
我尝试使用 mb_*、iconv、PCRE、mb_eregi_replace 和 u 修饰符(用于 PCRE),但没有一个能正常工作。
mb_eregi_replace 可以工作,但它只输出正确的 utf8 字符串,但当 preg_replace 使用相同的正则表达式时,它不会替换字符。
这是我的代码,适用于 unicode,但它不会替换文本。
function _data($data)
{
mb_regex_encoding('UTF-8');
return mb_eregi_replace('/[^a-zа-з0-9_]+/', '', $data);
}
var_dump(namespace\_data('Текст Removethis- and this _#$)( and also this $*@&$'));
结果是使用特殊字符(#_$..),当它应该替换它们时,如果我将函数更改为 preg_replace (并且没有 unicode),它应该替换它们。
I want to replace these chars [^a-zа-з0-9_] with null, but I can't do it when its multibyte string.
I tried with mb_*, iconv, PCRE, mb_eregi_replace and u modifier (for PCRE), but none of them worked well.
The mb_eregi_replace works, but it only outputs the correct utf8 string, but it doesn't replace the characters, when preg_replace works with the same regex..
Here is my code that works with unicode, but it doesn't replace text.
function _data($data)
{
mb_regex_encoding('UTF-8');
return mb_eregi_replace('/[^a-zа-з0-9_]+/', '', $data);
}
var_dump(namespace\_data('Текст Removethis- and this _#$)( and also this $*@&
and the result is with the special chars (#_$..) when it should replace them, if I change the function to preg_replace (and no unicode) it should replace them.
));
and the result is with the special chars (#_$..) when it should replace them, if I change the function to preg_replace (and no unicode) it should replace them.
发布评论
评论(1)
只要您的输入字符串是 UTF-8 编码的(测试是否不是或将其重新编码为 UTF-8),如果您在 preg_replace href="https://www.php.net/manual/en/reference.pcre.pattern.modifiers.php#:%7E:text=u%20(PCRE_UTF8),将%20视为%20无效。" rel="nofollow noreferrer">u (
PCRE_UTF8
) 修饰符(结尾处的小写 U):Demo
\w
= 任何单词字符u
(最后)= 启用 UTF- 8 为正则表达式。As long as your input string is UTF-8 encoded (test if not or re-encode it to UTF-8), you can safely use
preg_replace
if you use the correct regular expression with the u (PCRE_UTF8
) modifier (the is the lower-case U at the end):Demo
\w
= any word characteru
(at then end) = enable UTF-8 for the regex.