使用 preg_replace 时我得到双字符......?
当我使用以下脚本时,我得到双字符。为什么?
$clean_lastname = "Dür";
$clean_lastname = preg_replace("/[ùúûü]/", "u", $clean_lastname);
echo $clean_lastname;
输出:Duur
我希望它是Dur
。
我仍然做错了一件事...“在 preg 函数中放入一个数组的一个值”是怎么回事?
$clean_lastname = "Boerée";
$l = 0;
$pattern = array('[ÀÁÂÃÄÅ]','[Ç]','[ÈÉÊË]','[ÌÍÎÏ]','[Ñ]','[ÒÓÔÕÖØ]','[Ý]','[ß]','[àáâãäå]','[ç]','[èéêë]','[ìíîï]','[ñ]','[òóôõöø]','[ùúûü]','[ýÿ]');
$replace = array(A,C,E,I,N,O,Y,S,a,c,e,i,n,o,u,y);
foreach ($pattern as $wierdchar)
{
$clean_lastname = preg_replace('/$wierdchar/u', '$replace[$l]', $clean_lastname);
$l++;
}
//$clean_lastname = preg_replace('/[èéêë]/u', 'e', $clean_lastname);
//$clean_lastname = strtr($clean_lastname, "ùúûü","uuuu");
echo $clean_lastname;
When I use the following script, I get double characters. Why?
$clean_lastname = "Dür";
$clean_lastname = preg_replace("/[ùúûü]/", "u", $clean_lastname);
echo $clean_lastname;
Output: Duur
I want it to be Dur
.
I am still doing something wrong... What's up with the 'putting one value of an array in the preg function?
$clean_lastname = "Boerée";
$l = 0;
$pattern = array('[ÀÁÂÃÄÅ]','[Ç]','[ÈÉÊË]','[ÌÍÎÏ]','[Ñ]','[ÒÓÔÕÖØ]','[Ý]','[ß]','[àáâãäå]','[ç]','[èéêë]','[ìíîï]','[ñ]','[òóôõöø]','[ùúûü]','[ýÿ]');
$replace = array(A,C,E,I,N,O,Y,S,a,c,e,i,n,o,u,y);
foreach ($pattern as $wierdchar)
{
$clean_lastname = preg_replace('/$wierdchar/u', '$replace[$l]', $clean_lastname);
$l++;
}
//$clean_lastname = preg_replace('/[èéêë]/u', 'e', $clean_lastname);
//$clean_lastname = strtr($clean_lastname, "ùúûü","uuuu");
echo $clean_lastname;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
// 或者解决您的初始问题:
// OR to solve your initial issue:
我可以想象这种情况发生的唯一情况是当您的两个字符串(输入字符串和模式)具有不同的字符编码或两者都是 UTF-8 但您没有正确指定它时。
因为在后一种情况下,
"Dür"
相当于"D\xC3\xBCr"
(ü
用两个字节序列0xC3BC编码) 并且模式"/[ùúûü]/"
相当于"/[\xC3\xB9\xC3\xBA\xC3\xBB\xC3\xBC]/"
。由于转义序列\xHH
指定的每个字节都被视为单个字符,因此会产生以下结果:因此,在使用 UTF-8 时,请确保设置 u 修饰符标志,以便模式和输入字符串被视为 UTF-8 编码:
<强>编辑 既然您阐明了您的意图,并且您似乎正在尝试实现某种音译,你应该看看
iconv
< /a> 及其音译能力:另请参阅其他相关主题,例如:
The only situation I can image this to happen is when your two strings (the input string and the pattern) have a different character encoding or both are UTF-8 but you didn’t specify it properly.
Because in the latter case,
"Dür"
is equivalent to"D\xC3\xBCr"
(ü
is encoded with the two byte sequence 0xC3BC) and the pattern"/[ùúûü]/"
is equivalent to"/[\xC3\xB9\xC3\xBA\xC3\xBB\xC3\xBC]/"
. Since each byte specified by the escape sequence\xHH
is treated as a single character, this yields the following result:So when working with UTF-8 make sure to set the u modifier flag so that the pattern and input string is treated as UTF-8 encoded:
Edit Now that you clarified your intentions and you seem to try to implement some kind of transliteration, you should take a look at
iconv
and it’s ability to transliterate:See also other related topics like:
坚持原来的
strtr
stick with your original
strtr