PHP 中的平衡自动换行(最小粗糙度)
我将用 PHP 编写一个自动换行算法。我想将小块文本(短语)分成 n 行,最多 m 个字符(未给出 n ,因此会有根据需要设置尽可能多的行)。其特点是行间长度(以字符为单位)必须尽可能平衡。
输入文本示例:
How to do things
错误的输出(这是正常的自动换行行为),m=6:
How to
do
things
所需的输出,始终m=6:
How
to do
things
有人有建议或指南吗关于这个功能如何实现?基本上,我正在搜索一些在两到三个(尽可能多)等长行上漂亮的印刷短语。
更新:看来我正在精确搜索最小粗糙度自动换行算法。但我找不到任何真正的编程语言的实现(任何人,然后我可以用 PHP 转换它)。
更新 2:我为此启动了赏金。是否有可能在任何过程语言中都不存在最小粗糙度算法的任何公开实现?我需要以可以翻译成程序指令的方式编写一些东西。我现在能找到的只是一堆(通用)方程,但是需要一个最佳的搜索过程。我也将感谢只能近似最佳搜索算法的实现。
I'm going to make a word wrap algorithm in PHP. I want to split small chunks of text (short phrases) in n lines of maximum m characters (n is not given, so there will be as much lines as needed). The peculiarity is that lines length (in characters) has to be much balanced as possible across lines.
Example of input text:
How to do things
Wrong output (this is the normal word-wrap behavior), m=6:
How to
do
things
Desired output, always m=6:
How
to do
things
Does anyone have suggestions or guidelines on how to implement this function? Basically, I'm searching something for pretty print short phrases on two or three (as much as possible) equal length lines.
Update: It seems I'm searching exactly for a Minimum raggedness word wrap algorithm. But I can't find any implementation in a real programming language (anyone, then I can convert it in PHP).
Update 2: I started a bounty for this. Is it possible that do not exist any public implementation of Minimum raggedness algorithm in any procedural language? I need something written in a way that can be translated into procedural instructions. All I can find now is just a bounch of (generic) equation that however need a optimal searching procedure. I will be grateful also for an implementation that can only approximate that optimal searching algorithm.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我已经在 Alex 的同一行上实现了,编码维基百科算法,但直接用 PHP 实现(对我来说这是一个有趣的练习)。理解如何使用最优成本函数f(j),即“重复”部分,并不是很容易。感谢 Alex 提供了注释良好的代码。
?>
I've implemented on the same lines of Alex, coding the Wikipedia algorithm, but directly in PHP (an interesting exercise to me). Understanding how to use the optimal cost function f(j), i.e. the 'recurrence' part, is not very easy. Thanks to Alex for the well commented code.
?>
快速而肮脏,在 c++
测试中:
编辑:刚刚查看了维基百科页面以获取最小的粗糙度自动换行。将算法更改为给定算法(带有平方惩罚)
Quick and dirty, in c++
Test:
EDIT: just looked at the wikipedia page for minimum raggedness word wrap. Changed algorithm to the given one (with squared penalties)
AC 版本:
输出 1:
输出 2:
输出 3:
A C version:
Output 1:
Output 2:
Output 3:
我认为最简单的看待它的方法是在限制之间进行迭代,
例如
给定输入“如何做事”,
它输出
给定输入“玛丽有一只小羊羔”,
它输出
给定输入“这个超长的段落是为了演示
fmt(1)
程序是如何编写的处理较长的输入。当测试输入时,您不希望它们太短,也不要太长,因为程序的质量只能通过检查复杂的内容来确定。国会不得制定任何关于建立宗教,或禁止行使宗教自由;或剥夺言论自由或新闻自由;或人民和平集会以及向政府请愿伸冤的权利。”,最多 75 个字符宽度,它输出:I think the simplest way to look at it - is with iteration between limits
E.g.
Given the input "how to do things"
it outputs
Given the input "Mary had a little lamb"
it outputs
Given the input "This extra-long paragraph was writtin to demonstrate how the
fmt(1)
program handles longer inputs. When testing inputs, you don\'t want them to be too short, nor too long, because the quality of the program can only be determined upon inspection of complex content. The quick brown fox jumps over the lazy dog. Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the Government for a redress of grievances.", and limited to 75 chars max width, it outputs:贾斯汀的链接到Knuth 的《将段落分成几行》 是历史上最佳 答案。 (较新的系统还应用微型排版技术,例如调整字符宽度、字距调整等,但如果您只是寻找等宽纯文本,这些额外的方法将无济于事。)
如果您只是想解决问题,许多 Linux 系统上提供的
fmt(1)
实用程序自由软件基金会实现了 Knuth 算法的一个变体,该算法也尝试避免句子末尾的换行。我编写了您的输入和一个更大的示例,并通过fmt -w 20
运行它们以强制使用 20 个字符的行:如果您允许非平凡输入的默认 75 个字符宽度,则输出看起来会更好:
Justin's link to Knuth's Breaking Paragraphs Into Lines is the historically best answer. (Newer systems also apply microtypography techniques such as fiddling with character widths, kerning, and so on, but if you're simply looking for monospaced plain-text, these extra approaches won't help.)
If you just want to solve the problem, the
fmt(1)
utility supplied on many Linux systems by the Free Software Foundation implements a variant of Knuth's algorithm that also attempts to avoid line breaks at the end of sentences. I wrote your inputs and a larger example, and ran them throughfmt -w 20
to force 20-character lines:The output looks much better if you allow it the default 75 characters width for non-trivial input:
这是一个 bash 版本:
示例:
需要 'fold' 和 'wc',它们通常在安装 bash 的地方可用。
Here is a bash version:
example:
Requires 'fold' and 'wc' which are usually available where bash is installed.
有趣的问题。 这是 ac# 版本。我需要这个用于手机游戏的 UI 文本框。找不到任何东西,所以我自己推出了。我对这个问题的复杂性感到惊讶,所以发布这篇文章希望如果您需要将一些经过测试的 C# 代码放入现有项目中,它可以为您节省一些时间。
例子:
Fun problem. Here's a c# version. I needed this for a mobile game's UI textbox. Couldn't find anything so I rolled my own. I was surprised by the complexity of this problem so posting this hoping it will save you some time if you need to drop some tested c# code into an existing project.
Example: