将 utf8 字符串拆分为字符数组
我正在尝试将 utf8 编码的字符串拆分为字符数组。我现在使用的功能曾经可以工作,但由于某种原因它不再工作了。可能是什么原因。更好的是,我该如何解决它?
这是我的字符串:
Zelf heb ik maar één vraag:wie ben jij?
这是我的功能:
function utf8Split($str, $len = 1)
{
$arr = array();
$strLen = mb_strlen($str);
for ($i = 0; $i < $strLen; $i++)
{
$arr[] = mb_substr($str, $i, $len);
}
return $arr;
}
这是结果:
Array
(
[0] => Z
[1] => e
[2] => l
[3] => f
[4] =>
[5] => h
[6] => e
[7] => b
[8] =>
[9] => i
[10] => k
[11] =>
[12] => m
[13] => a
[14] => a
[15] => r
[16] =>
[17] => e
[18] => ́
[19] => e
[20] => ́
[21] => n
[22] =>
[23] => v
[24] => r
[25] => a
[26] => a
[27] => g
[28] => :
[29] =>
[30] => w
[31] => i
[32] => e
[33] =>
[34] => b
[35] => e
[36] => n
[37] =>
[38] => j
[39] => i
[40] => j
[41] => ?
)
I'm trying to split a utf8 encoded string into an array of chars. The function that I now use used to work, but for some reason it doesn't work anymore. What could be the reason. And better yet, how can I fix it?
This is my string:
Zelf heb ik maar één vraag: wie ben jij?
This is my function:
function utf8Split($str, $len = 1)
{
$arr = array();
$strLen = mb_strlen($str);
for ($i = 0; $i < $strLen; $i++)
{
$arr[] = mb_substr($str, $i, $len);
}
return $arr;
}
This is the result:
Array
(
[0] => Z
[1] => e
[2] => l
[3] => f
[4] =>
[5] => h
[6] => e
[7] => b
[8] =>
[9] => i
[10] => k
[11] =>
[12] => m
[13] => a
[14] => a
[15] => r
[16] =>
[17] => e
[18] => ́
[19] => e
[20] => ́
[21] => n
[22] =>
[23] => v
[24] => r
[25] => a
[26] => a
[27] => g
[28] => :
[29] =>
[30] => w
[31] => i
[32] => e
[33] =>
[34] => b
[35] => e
[36] => n
[37] =>
[38] => j
[39] => i
[40] => j
[41] => ?
)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
这是最好的解决方案!:
我在 PHP 手册中找到了这个不错的解决方案页面。
它的工作速度非常快:
在 PHP 5.6.18 中,它可以在几秒钟内分割一个 6 MB 的大文本文件。
最棒的是。它不需要多字节(mb_)支持!
类似的答案也此处。
This is the best solution!:
I've found this nice solution in the PHP manual pages.
It works really fast:
In PHP 5.6.18 it split a 6 MB big text file in a matter of seconds.
Best of all. It doesn't need MultiByte (mb_) support!
Similar answer also here.
对于
mb_...
函数,您应该指定字符集编码。在您的示例代码中,尤其是以下两行:
完整图片:
因为您在这里使用 UTF-8。然而,如果输入没有正确编码,这将“不再”起作用——只是因为它不是为其他东西而设计的。
您也可以使用 PCRE 正则表达式处理 UTF-8 编码的字符串,例如,这将以更少的代码返回您要查找的内容:
在
preg_split
旁边还有mb_split
。For the
mb_...
functions you should specify the charset encoding.In your example code these are especially the following two lines:
The full picture:
Because you're using UTF-8 here. However, if the input is not properly encoded, this won't work "any longer" - just because it has not been designed for something else.
You can alternativly process UTF-8 encoded strings with PCRE regular expressions, for example this will return what you're looking for in less code:
Next to
preg_split
there is alsomb_split
.如果您不确定 mb_string 函数库的可用性,请使用:
版本 1:
版本 2:
两个函数均在 PHP5 中测试
If you not sure about availability of mb_string function library, then use:
Version 1:
Version 2:
Both functions tested in PHP5
PHP中有一个多字节分割函数,
mb_split
。There is a multibyte split function in PHP,
mb_split
.我发现 é 不是我期望的字符。显然,né 和 ńe 之间是有区别的。我首先通过 规范化 字符串来使其工作。
I found out the é was not the character I expected. Apparently there is a difference between né and ńe. I got it working by normalizing the string first.
46 个阵列 - 41 个阵列
46 arrays - off 41 arrays
从 php 7.4 开始,您可以使用
mb_str_split
:Since php 7.4, you can use
mb_str_split
: