UTF-8 的多字节安全 wordwrap() 函数
PHP 的 wordwrap()
函数对于多字节字符串无法正常工作比如 UTF-8。
评论中有一些mb安全函数的例子,但是根据一些不同的测试数据,它们似乎都存在一些问题。
该函数应采用与 wordwrap() 完全相同的参数。
具体来说,请确保它的作用是:
- 则不要剪切单词中部,否则
- 如果
$cut = true
则剪切单词中部,如果$break = ' '
不在单词中插入额外的空格> - 也适用于
$break = "\n"
- 适用于 ASCII 和所有有效的 UTF-8
PHP's wordwrap()
function doesn't work correctly for multi-byte strings like UTF-8.
There are a few examples of mb safe functions in the comments, but with some different test data they all seem to have some problems.
The function should take the exact same parameters as wordwrap()
.
Specifically be sure it works to:
- cut mid-word if
$cut = true
, don't cut mid-word otherwise - not insert extra spaces in words if
$break = ' '
- also work for
$break = "\n"
- work for ASCII, and all valid UTF-8
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
我还没有找到任何适合我的工作代码。这是我写的。对我来说它正在工作,但认为它可能不是最快的。
I haven't found any working code for me. Here is what I've written. For me it is working, thought it is probably not the fastest.
因为没有答案可以处理每个用例,所以这里有一个可以处理的东西。该代码基于 Drupal 的
AbstractStringWrapper::wordWrap
。Because no answer was handling every use case, here is something that does. The code is based on Drupal’s
AbstractStringWrapper::wordWrap
.总时间:0.0020880699 是个好时光:)
Total time: 0.0020880699 is good time :)
只是想分享我在网上找到的一些替代方案。
使用
mb_str_split
,您可以使用join
将单词与
组合起来。最后创建你自己的助手,也许是 mb_textwrap
参见截图演示:
Just want to share some alternative I found on the net.
Using
mb_str_split
, you can usejoin
to combine the words with<br>
.And finally create your own helper, perhaps
mb_textwrap
See screenshot demo:
自定义字边界
Unicode 文本比 8 位编码具有更多的潜在字边界,包括 17 个空格分隔符 和 全角逗号< /a>.该解决方案允许您为您的应用程序自定义单词边界列表。
更好的性能
您曾经对 PHP 内置函数的
mb_*
系列进行过基准测试吗?它们根本无法很好地扩展。通过使用自定义nextCharUtf8()
,我们可以完成相同的工作,但速度要快几个数量级,尤其是在处理大型字符串时。Custom word boundaries
Unicode text has many more potential word boundaries than 8-bit encodings, including 17 space separators, and the full width comma. This solution allows you to customize a list of word boundaries for your application.
Better performance
Have you ever benchmarked the
mb_*
family of PHP built-ins? They don't scale well at all. By using a customnextCharUtf8()
, we can do the same job, but orders of magnitude faster, especially on large strings.这是我自己尝试的一个函数,它通过了我自己的一些测试,但我不能保证它是 100% 完美的,所以如果您发现问题,请发布一个更好的函数。
Here's my own attempt at a function that passed a few of my own tests, though I can't promise it's 100% perfect, so please post a better one if you see a problem.
这是我从互联网上找到的其他人的灵感中编写的多字节自动换行函数。
不要忘记将 PHP 配置为使用 UTF-8:
我希望这会有所帮助。
纪尧姆
Here is the multibyte wordwrap function i have coded taking inspiration from of others found on the internet.
Dont' forget to configure PHP for using UTF-8 with :
I hope this will help.
Guillaume
就我而言,输入是日语段落,需要在大约 70 个字符处换行,而上面给出的解决方案并未将其换行。
我最终编写了一个适合我的解决方案。我还没有测试该代码片段的性能。
In my case, the input was a japanese paragraph and needed to have line break at around 70 character, which was not wrapped with the solution given above.
I ended up writing a solution which works for me. I have not tested the snippet for performance.
这个好像效果不错...
This one seems to work well...