PHP 中的平衡自动换行(最小粗糙度)

发布于 2025-01-01 00:39:22 字数 923 浏览 0 评论 0原文

我将用 PHP 编写一个自动换行算法。我想将小块文本(短语)分成 n 行,最多 m 个字符(未给出 n ,因此会有根据需要设置尽可能多的行)。其特点是行间长度(以字符为单位)必须尽可能平衡。

输入文本示例:

How to do things

错误的输出(这是正常的自动换行行为),m=6

How to
do
things

所需的输出,始终m=6

How 
to do 
things

有人有建议或指南吗关于这个功能如何实现?基本上,我正在搜索一些在两到三个(尽可能多)等长行上漂亮的印刷短语。


更新:看来我正在精确搜索最小粗糙度自动换行算法。但我找不到任何真正的编程语言的实现(任何人,然后我可以用 PHP 转换它)。


更新 2:我为此启动了赏金。是否有可能在任何过程语言中都不存在最小粗糙度算法的任何公开实现?我需要以可以翻译成程序指令的方式编写一些东西。我现在能找到的只是一堆(通用)方程,但是需要一个最佳的搜索过程。我也将感谢只能近似最佳搜索算法的实现。

I'm going to make a word wrap algorithm in PHP. I want to split small chunks of text (short phrases) in n lines of maximum m characters (n is not given, so there will be as much lines as needed). The peculiarity is that lines length (in characters) has to be much balanced as possible across lines.

Example of input text:

How to do things

Wrong output (this is the normal word-wrap behavior), m=6:

How to
do
things

Desired output, always m=6:

How 
to do 
things

Does anyone have suggestions or guidelines on how to implement this function? Basically, I'm searching something for pretty print short phrases on two or three (as much as possible) equal length lines.


Update: It seems I'm searching exactly for a Minimum raggedness word wrap algorithm. But I can't find any implementation in a real programming language (anyone, then I can convert it in PHP).


Update 2: I started a bounty for this. Is it possible that do not exist any public implementation of Minimum raggedness algorithm in any procedural language? I need something written in a way that can be translated into procedural instructions. All I can find now is just a bounch of (generic) equation that however need a optimal searching procedure. I will be grateful also for an implementation that can only approximate that optimal searching algorithm.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

帅冕 2025-01-08 00:39:22

我已经在 Alex 的同一行上实现了,编码维基百科算法,但直接用 PHP 实现(对我来说这是一个有趣的练习)。理解如何使用最优成本函数f(j),即“重复”部分,并不是很容易。感谢 Alex 提供了注释良好的代码。

/** 
 * minimumRaggedness
 *
 * @param string $input paragraph. Each word separed by 1 space.
 * @param int $LineWidth the max chars per line.
 * @param string $lineBreak wrapped lines separator.
 * 
 * @return string $output the paragraph wrapped.
 */
function minimumRaggedness($input, $LineWidth, $lineBreak = "\n")
{
    $words = explode(" ", $input);
    $wsnum = count($words);
    $wslen = array_map("strlen", $words);
    $inf = 1000000; //PHP_INT_MAX;

    // keep Costs
    $C = array();

    for ($i = 0; $i < $wsnum; ++$i)
    {
        $C[] = array();
        for ($j = $i; $j < $wsnum; ++$j)
        {
            $l = 0;
            for ($k = $i; $k <= $j; ++$k)
                $l += $wslen[$k];
            $c = $LineWidth - ($j - $i) - $l;
            if ($c < 0)
                $c = $inf;
            else
                $c = $c * $c;
            $C[$i][$j] = $c;
        }
    }

    // apply recurrence
    $F = array();
    $W = array();
    for ($j = 0; $j < $wsnum; ++$j)
    {
        $F[$j] = $C[0][$j];
        $W[$j] = 0;
        if ($F[$j] == $inf)
        {
            for ($k = 0; $k < $j; ++$k)
            {
                $t = $F[$k] + $C[$k + 1][$j];
                if ($t < $F[$j])
                {
                    $F[$j] = $t;
                    $W[$j] = $k + 1;
                }
            }
        }
    }

    // rebuild wrapped paragraph
    $output = "";
    if ($F[$wsnum - 1] < $inf)
    {
        $S = array();
        $j = $wsnum - 1;
        for ( ; ; )
        {
            $S[] = $j;
            $S[] = $W[$j];
            if ($W[$j] == 0)
                break;
            $j = $W[$j] - 1;
        }

        $pS = count($S) - 1;
        do
        {
            $i = $S[$pS--];
            $j = $S[$pS--];
            for ($k = $i; $k < $j; $k++)
                $output .= $words[$k] . " ";
            $output .= $words[$k] . $lineBreak;
        }
        while ($j < $wsnum - 1);
    }
    else
        $output = $input;

    return $output;
}

?>

I've implemented on the same lines of Alex, coding the Wikipedia algorithm, but directly in PHP (an interesting exercise to me). Understanding how to use the optimal cost function f(j), i.e. the 'recurrence' part, is not very easy. Thanks to Alex for the well commented code.

/** 
 * minimumRaggedness
 *
 * @param string $input paragraph. Each word separed by 1 space.
 * @param int $LineWidth the max chars per line.
 * @param string $lineBreak wrapped lines separator.
 * 
 * @return string $output the paragraph wrapped.
 */
function minimumRaggedness($input, $LineWidth, $lineBreak = "\n")
{
    $words = explode(" ", $input);
    $wsnum = count($words);
    $wslen = array_map("strlen", $words);
    $inf = 1000000; //PHP_INT_MAX;

    // keep Costs
    $C = array();

    for ($i = 0; $i < $wsnum; ++$i)
    {
        $C[] = array();
        for ($j = $i; $j < $wsnum; ++$j)
        {
            $l = 0;
            for ($k = $i; $k <= $j; ++$k)
                $l += $wslen[$k];
            $c = $LineWidth - ($j - $i) - $l;
            if ($c < 0)
                $c = $inf;
            else
                $c = $c * $c;
            $C[$i][$j] = $c;
        }
    }

    // apply recurrence
    $F = array();
    $W = array();
    for ($j = 0; $j < $wsnum; ++$j)
    {
        $F[$j] = $C[0][$j];
        $W[$j] = 0;
        if ($F[$j] == $inf)
        {
            for ($k = 0; $k < $j; ++$k)
            {
                $t = $F[$k] + $C[$k + 1][$j];
                if ($t < $F[$j])
                {
                    $F[$j] = $t;
                    $W[$j] = $k + 1;
                }
            }
        }
    }

    // rebuild wrapped paragraph
    $output = "";
    if ($F[$wsnum - 1] < $inf)
    {
        $S = array();
        $j = $wsnum - 1;
        for ( ; ; )
        {
            $S[] = $j;
            $S[] = $W[$j];
            if ($W[$j] == 0)
                break;
            $j = $W[$j] - 1;
        }

        $pS = count($S) - 1;
        do
        {
            $i = $S[$pS--];
            $j = $S[$pS--];
            for ($k = $i; $k < $j; $k++)
                $output .= $words[$k] . " ";
            $output .= $words[$k] . $lineBreak;
        }
        while ($j < $wsnum - 1);
    }
    else
        $output = $input;

    return $output;
}

?>

甜柠檬 2025-01-08 00:39:22

快速而肮脏,在 c++

#include <sstream>
#include <iostream>
#include <vector>
#include <cstdlib>
#include <memory.h>

using namespace std;

int cac[1000][1000];
string res[1000][1000];
vector<string> words;
int M;

int go(int a, int b){
    if(cac[a][b]>= 0) return cac[a][b];
    if(a == b) return 0;

    int csum = -1;
    for(int i=a; i<b; ++i){
    csum += words[i].size() + 1;
    }
    if(csum <= M || a == b-1){
    string sep = "";
        for(int i=a; i<b; ++i){
            res[a][b].append(sep);
            res[a][b].append(words[i]);
            sep = " ";
    }
    return cac[a][b] = (M-csum)*(M-csum);
    }

    int ret = 1000000000;
    int best_sp = -1;
    for(int sp=a+1; sp<b; ++sp){
    int cur = go(a, sp) + go(sp,b);
    if(cur <= ret){
        ret = cur;
        best_sp = sp;
    }
    }
    res[a][b] = res[a][best_sp] + "\n" + res[best_sp][b];
    return cac[a][b] = ret;
}


int main(int argc, char ** argv){
    memset(cac, -1, sizeof(cac));
    M = atoi(argv[1]);
    string word;
    while(cin >> word) words.push_back(word);
    go(0, words.size());
    cout << res[0][words.size()] << endl;
}

测试中:

$ echo "The quick brown fox jumps over a lazy dog" |./a.out 10
The quick
brown fox
jumps over
a lazy dog

编辑:刚刚查看了维基百科页面以获取最小的粗糙度自动换行。将算法更改为给定算法(带有平方惩罚)

Quick and dirty, in c++

#include <sstream>
#include <iostream>
#include <vector>
#include <cstdlib>
#include <memory.h>

using namespace std;

int cac[1000][1000];
string res[1000][1000];
vector<string> words;
int M;

int go(int a, int b){
    if(cac[a][b]>= 0) return cac[a][b];
    if(a == b) return 0;

    int csum = -1;
    for(int i=a; i<b; ++i){
    csum += words[i].size() + 1;
    }
    if(csum <= M || a == b-1){
    string sep = "";
        for(int i=a; i<b; ++i){
            res[a][b].append(sep);
            res[a][b].append(words[i]);
            sep = " ";
    }
    return cac[a][b] = (M-csum)*(M-csum);
    }

    int ret = 1000000000;
    int best_sp = -1;
    for(int sp=a+1; sp<b; ++sp){
    int cur = go(a, sp) + go(sp,b);
    if(cur <= ret){
        ret = cur;
        best_sp = sp;
    }
    }
    res[a][b] = res[a][best_sp] + "\n" + res[best_sp][b];
    return cac[a][b] = ret;
}


int main(int argc, char ** argv){
    memset(cac, -1, sizeof(cac));
    M = atoi(argv[1]);
    string word;
    while(cin >> word) words.push_back(word);
    go(0, words.size());
    cout << res[0][words.size()] << endl;
}

Test:

$ echo "The quick brown fox jumps over a lazy dog" |./a.out 10
The quick
brown fox
jumps over
a lazy dog

EDIT: just looked at the wikipedia page for minimum raggedness word wrap. Changed algorithm to the given one (with squared penalties)

紙鸢 2025-01-08 00:39:22

AC 版本:

// This is a direct implementation of the minimum raggedness word wrapping
// algorithm from http://en.wikipedia.org/wiki/Word_wrap#Minimum_raggedness

#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
#include <limits.h>

const char* pText = "How to do things";
int LineWidth = 6;
int WordCnt;
const char** pWords;
int* pWordLengths;

int* pC;
int* pF;
int* pW;
int* pS;

int CountWords(const char* p)
{
  int cnt = 0;

  while (*p != '\0')
  {
    while (*p != '\0' && isspace(*p)) p++;

    if (*p != '\0')
    {
      cnt++;
      while (*p != '\0' && !isspace(*p)) p++;
    }
  }

  return cnt;
}

void FindWords(const char* p, int cnt, const char** pWords, int* pWordLengths)
{
  while (*p != '\0')
  {
    while (*p != '\0' && isspace(*p)) p++;

    if (*p != '\0')
    {
      *pWords++ = p;
      while (*p != '\0' && !isspace(*p)) p++;
      *pWordLengths++ = p - pWords[-1];
    }
  }
}

void PrintWord(const char* p, int l)
{
  int i;

  for (i = 0; i < l; i++)
    printf("%c", p[i]);
}

// 1st program's argument is the text
// 2nd program's argument is the line width
int main(int argc, char* argv[])
{
  int i, j;

  if (argc >= 3)
  {
    pText = argv[1];
    LineWidth = atoi(argv[2]);
  }

  WordCnt = CountWords(pText);

  pWords = malloc(WordCnt * sizeof(*pWords));
  pWordLengths = malloc(WordCnt * sizeof(*pWordLengths));

  FindWords(pText, WordCnt, pWords, pWordLengths);

  printf("Input Text: \"%s\"\n", pText);
  printf("Line Width: %d\n", LineWidth);
  printf("Words     : %d\n", WordCnt);

#if 0
  for (i = 0; i < WordCnt; i++)
  {
    printf("\"");
    PrintWord(pWords[i], pWordLengths[i]);
    printf("\"\n");
  }
#endif

  // Build c(i,j) in pC[]
  pC = malloc(WordCnt * WordCnt * sizeof(int));
  for (i = 0; i < WordCnt; i++)
  {
    for (j = 0; j < WordCnt; j++)
      if (j >= i)
      {
        int k;
        int c = LineWidth - (j - i);
        for (k = i; k <= j; k++) c -= pWordLengths[k];
        c = (c >= 0) ? c * c : INT_MAX;
        pC[j * WordCnt + i] = c;
      }
      else
        pC[j * WordCnt + i] = INT_MAX;
  }

  // Build f(j) in pF[] and store the wrap points in pW[]
  pF = malloc(WordCnt * sizeof(int));
  pW = malloc(WordCnt * sizeof(int));
  for (j = 0; j < WordCnt; j++)
  {
    pW[j] = 0;
    if ((pF[j] = pC[j * WordCnt]) == INT_MAX)
    {
      int k;
      for (k = 0; k < j; k++)
      {
        int s;
        if (pF[k] == INT_MAX || pC[j * WordCnt + k + 1] == INT_MAX)
          s = INT_MAX;
        else
          s = pF[k] + pC[j * WordCnt + k + 1];
        if (pF[j] > s)
        {
          pF[j] = s;
          pW[j] = k + 1;
        }
      }
    }
  }

  // Print the optimal solution cost
  printf("f         : %d\n", pF[WordCnt - 1]);

  // Print the optimal solution, if any
  pS = malloc(2 * WordCnt * sizeof(int));
  if (pF[WordCnt - 1] != INT_MAX)
  {
    // Work out the solution's words by back tracking the
    // wrap points from pW[] and store them on the pS[] stack
    j = WordCnt - 1;
    for (;;)
    {
      *pS++ = j;
      *pS++ = pW[j];
      if (!pW[j]) break;
      j = pW[j] - 1;
    }
    // Print the solution line by line, word by word
    // in direct order
    do
    {
      int k;
      i = *--pS;
      j = *--pS;
      for (k = i; k <= j; k++)
      {
        PrintWord(pWords[k], pWordLengths[k]);
        printf(" ");
      }
      printf("\n");
    } while (j < WordCnt - 1);
  }

  return 0;
}

输出 1:

ww.exe
Input Text: "How to do things"
Line Width: 6
Words     : 4
f         : 10
How
to do
things

输出 2:

ww.exe "aaa bb cc ddddd" 6
Input Text: "aaa bb cc ddddd"
Line Width: 6
Words     : 4
f         : 11
aaa
bb cc
ddddd

输出 3:

ww.exe "I started a bounty for this. Is it possible that do not exist any public implementation of Minimum raggedness algorithm in any procedural language? I need something written in a way that can be translated into procedural instructions. All I can find now is just a bounch of (generic) equation that however need a optimal searhing procedure. I will be grateful also for an implementation that can only approximate that optimal searching algorithm." 60
Input Text: "I started a bounty for this. Is it possible that do not exist any public implementation of Minimum raggedness algorithm in any procedural language? I need something written in a way that can be translated into procedural instructions. All I can find now is just a bounch of (generic) equation that however need a optimal searhing procedure. I will be grateful also for an implementation that can only approximate that optimal searching algorithm."
Line Width: 60
Words     : 73
f         : 241
I started a bounty for this. Is it possible that do not
exist any public implementation of Minimum raggedness
algorithm in any procedural language? I need something
written in a way that can be translated into procedural
instructions. All I can find now is just a bounch of
(generic) equation that however need a optimal searhing
procedure. I will be grateful also for an implementation
that can only approximate that optimal searching algorithm.

A C version:

// This is a direct implementation of the minimum raggedness word wrapping
// algorithm from http://en.wikipedia.org/wiki/Word_wrap#Minimum_raggedness

#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
#include <limits.h>

const char* pText = "How to do things";
int LineWidth = 6;
int WordCnt;
const char** pWords;
int* pWordLengths;

int* pC;
int* pF;
int* pW;
int* pS;

int CountWords(const char* p)
{
  int cnt = 0;

  while (*p != '\0')
  {
    while (*p != '\0' && isspace(*p)) p++;

    if (*p != '\0')
    {
      cnt++;
      while (*p != '\0' && !isspace(*p)) p++;
    }
  }

  return cnt;
}

void FindWords(const char* p, int cnt, const char** pWords, int* pWordLengths)
{
  while (*p != '\0')
  {
    while (*p != '\0' && isspace(*p)) p++;

    if (*p != '\0')
    {
      *pWords++ = p;
      while (*p != '\0' && !isspace(*p)) p++;
      *pWordLengths++ = p - pWords[-1];
    }
  }
}

void PrintWord(const char* p, int l)
{
  int i;

  for (i = 0; i < l; i++)
    printf("%c", p[i]);
}

// 1st program's argument is the text
// 2nd program's argument is the line width
int main(int argc, char* argv[])
{
  int i, j;

  if (argc >= 3)
  {
    pText = argv[1];
    LineWidth = atoi(argv[2]);
  }

  WordCnt = CountWords(pText);

  pWords = malloc(WordCnt * sizeof(*pWords));
  pWordLengths = malloc(WordCnt * sizeof(*pWordLengths));

  FindWords(pText, WordCnt, pWords, pWordLengths);

  printf("Input Text: \"%s\"\n", pText);
  printf("Line Width: %d\n", LineWidth);
  printf("Words     : %d\n", WordCnt);

#if 0
  for (i = 0; i < WordCnt; i++)
  {
    printf("\"");
    PrintWord(pWords[i], pWordLengths[i]);
    printf("\"\n");
  }
#endif

  // Build c(i,j) in pC[]
  pC = malloc(WordCnt * WordCnt * sizeof(int));
  for (i = 0; i < WordCnt; i++)
  {
    for (j = 0; j < WordCnt; j++)
      if (j >= i)
      {
        int k;
        int c = LineWidth - (j - i);
        for (k = i; k <= j; k++) c -= pWordLengths[k];
        c = (c >= 0) ? c * c : INT_MAX;
        pC[j * WordCnt + i] = c;
      }
      else
        pC[j * WordCnt + i] = INT_MAX;
  }

  // Build f(j) in pF[] and store the wrap points in pW[]
  pF = malloc(WordCnt * sizeof(int));
  pW = malloc(WordCnt * sizeof(int));
  for (j = 0; j < WordCnt; j++)
  {
    pW[j] = 0;
    if ((pF[j] = pC[j * WordCnt]) == INT_MAX)
    {
      int k;
      for (k = 0; k < j; k++)
      {
        int s;
        if (pF[k] == INT_MAX || pC[j * WordCnt + k + 1] == INT_MAX)
          s = INT_MAX;
        else
          s = pF[k] + pC[j * WordCnt + k + 1];
        if (pF[j] > s)
        {
          pF[j] = s;
          pW[j] = k + 1;
        }
      }
    }
  }

  // Print the optimal solution cost
  printf("f         : %d\n", pF[WordCnt - 1]);

  // Print the optimal solution, if any
  pS = malloc(2 * WordCnt * sizeof(int));
  if (pF[WordCnt - 1] != INT_MAX)
  {
    // Work out the solution's words by back tracking the
    // wrap points from pW[] and store them on the pS[] stack
    j = WordCnt - 1;
    for (;;)
    {
      *pS++ = j;
      *pS++ = pW[j];
      if (!pW[j]) break;
      j = pW[j] - 1;
    }
    // Print the solution line by line, word by word
    // in direct order
    do
    {
      int k;
      i = *--pS;
      j = *--pS;
      for (k = i; k <= j; k++)
      {
        PrintWord(pWords[k], pWordLengths[k]);
        printf(" ");
      }
      printf("\n");
    } while (j < WordCnt - 1);
  }

  return 0;
}

Output 1:

ww.exe
Input Text: "How to do things"
Line Width: 6
Words     : 4
f         : 10
How
to do
things

Output 2:

ww.exe "aaa bb cc ddddd" 6
Input Text: "aaa bb cc ddddd"
Line Width: 6
Words     : 4
f         : 11
aaa
bb cc
ddddd

Output 3:

ww.exe "I started a bounty for this. Is it possible that do not exist any public implementation of Minimum raggedness algorithm in any procedural language? I need something written in a way that can be translated into procedural instructions. All I can find now is just a bounch of (generic) equation that however need a optimal searhing procedure. I will be grateful also for an implementation that can only approximate that optimal searching algorithm." 60
Input Text: "I started a bounty for this. Is it possible that do not exist any public implementation of Minimum raggedness algorithm in any procedural language? I need something written in a way that can be translated into procedural instructions. All I can find now is just a bounch of (generic) equation that however need a optimal searhing procedure. I will be grateful also for an implementation that can only approximate that optimal searching algorithm."
Line Width: 60
Words     : 73
f         : 241
I started a bounty for this. Is it possible that do not
exist any public implementation of Minimum raggedness
algorithm in any procedural language? I need something
written in a way that can be translated into procedural
instructions. All I can find now is just a bounch of
(generic) equation that however need a optimal searhing
procedure. I will be grateful also for an implementation
that can only approximate that optimal searching algorithm.
空城旧梦 2025-01-08 00:39:22

我认为最简单的看待它的方法是在限制之间进行迭代,

例如

/** 
 * balancedWordWrap
 *
 * @param string $input
 * @param int $maxWidth the max chars per line
 */
function balancedWordWrap($input, $maxWidth = null) {
    $length = strlen($input);
    if (!$maxWidth) {
        $maxWidth = min(ceil($length / 2), 75);
    }   
    $minWidth = min(ceil($length / 2), $maxWidth / 2); 

    $permutations = array();
    $scores = array();
    $lowestScore = 999;
    $lowest = $minWidth;

    foreach(range($minWidth, $maxWidth) as $width) {
        $permutations[$width] = wordwrap($input, $width);
        $lines = explode("\n", $permutations[$width]);

        $max = 0;
        foreach($lines as $line) {
            $lineLength = strlen($line);
            if ($lineLength > $max) {
                $max = $lineLength;
            }   
        }   

        $score = 0;
        foreach($lines as $line) {
            $lineLength = strlen($line);
            $score += pow($max - $lineLength, 2); 
        }   

        $scores[$width] = $score;
        if ($score < $lowestScore) {
            $lowestScore = $score;
            $lowest = $width;
        }   
    }   

    return $permutations[$lowest];
} 

给定输入“如何做事”,

它输出

How
to do
things

给定输入“玛丽有一只小羊羔”,

它输出

Mary had a
little lamb

给定输入“这个超长的段落是为了演示 fmt(1) 程序是如何编写的处理较长的输入。当测试输入时,您不希望它们太短,也不要太长,因为程序的质量只能通过检查复杂的内容来确定。国会不得制定任何关于建立宗教,或禁止行使宗教自由;或剥夺言论自由或新闻自由;或人民和平集会以及向政府请愿伸冤的权利。”,最多 75 个字符宽度,它输出:

This extra-long paragraph was writtin to demonstrate how the `fmt(1)`
program handles longer inputs. When testing inputs, you don't want them 
be too short, nor too long, because the quality of the program can only be
determined upon inspection of complex content. The quick brown fox jumps
over the lazy dog. Congress shall make no law respecting an establishment
of religion, or prohibiting the free exercise thereof; or abridging the
freedom of speech, or of the press; or the right of the people peaceably
to assemble, and to petition the Government for a redress of grievances.

I think the simplest way to look at it - is with iteration between limits

E.g.

/** 
 * balancedWordWrap
 *
 * @param string $input
 * @param int $maxWidth the max chars per line
 */
function balancedWordWrap($input, $maxWidth = null) {
    $length = strlen($input);
    if (!$maxWidth) {
        $maxWidth = min(ceil($length / 2), 75);
    }   
    $minWidth = min(ceil($length / 2), $maxWidth / 2); 

    $permutations = array();
    $scores = array();
    $lowestScore = 999;
    $lowest = $minWidth;

    foreach(range($minWidth, $maxWidth) as $width) {
        $permutations[$width] = wordwrap($input, $width);
        $lines = explode("\n", $permutations[$width]);

        $max = 0;
        foreach($lines as $line) {
            $lineLength = strlen($line);
            if ($lineLength > $max) {
                $max = $lineLength;
            }   
        }   

        $score = 0;
        foreach($lines as $line) {
            $lineLength = strlen($line);
            $score += pow($max - $lineLength, 2); 
        }   

        $scores[$width] = $score;
        if ($score < $lowestScore) {
            $lowestScore = $score;
            $lowest = $width;
        }   
    }   

    return $permutations[$lowest];
} 

Given the input "how to do things"

it outputs

How
to do
things

Given the input "Mary had a little lamb"

it outputs

Mary had a
little lamb

Given the input "This extra-long paragraph was writtin to demonstrate how the fmt(1) program handles longer inputs. When testing inputs, you don\'t want them to be too short, nor too long, because the quality of the program can only be determined upon inspection of complex content. The quick brown fox jumps over the lazy dog. Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the Government for a redress of grievances.", and limited to 75 chars max width, it outputs:

This extra-long paragraph was writtin to demonstrate how the `fmt(1)`
program handles longer inputs. When testing inputs, you don't want them 
be too short, nor too long, because the quality of the program can only be
determined upon inspection of complex content. The quick brown fox jumps
over the lazy dog. Congress shall make no law respecting an establishment
of religion, or prohibiting the free exercise thereof; or abridging the
freedom of speech, or of the press; or the right of the people peaceably
to assemble, and to petition the Government for a redress of grievances.
╭ゆ眷念 2025-01-08 00:39:22

贾斯汀的链接Knuth 的《将段落分成几行》 是历史上最佳 答案。 (较新的系统还应用微型排版技术,例如调整字符宽度、字距调整等,但如果您只是寻找等宽纯文本,这些额外的方法将无济于事。)

如果您只是想解决问题,许多 Linux 系统上提供的 fmt(1) 实用程序自由软件基金会实现了 Knuth 算法的一个变体,该算法也尝试避免句子末尾的换行。我编写了您的输入和一个更大的示例,并通过 fmt -w 20 运行它们以强制使用 20 个字符的行:

$ fmt -w 20 input 
Lorem ipsum dolor
sit amet

Supercalifragilisticexpialidocious
and some other
small words

One long
extra-long-word

This extra-long
paragraph
was writtin to
demonstrate how the
`fmt(1)` program
handles longer
inputs. When
testing inputs,
you don't want them
to be too short,
nor too long,
because the quality
of the program can
only be determined
upon inspection
of complex
content. The quick
brown fox jumps
over the lazy
dog. Congress
shall make no
law respecting
an establishment
of religion, or
prohibiting the
free exercise
thereof; or
abridging the
freedom of speech,
or of the press;
or the right of the
people peaceably
to assemble,
and to petition
the Government
for a redress of
grievances.

如果您允许非平凡输入的默认 75 个字符宽度,则输出看起来会更好:

$ fmt input 
Lorem ipsum dolor sit amet

Supercalifragilisticexpialidocious and some other small words

One long extra-long-word

This extra-long paragraph was writtin to demonstrate how the `fmt(1)`
program handles longer inputs. When testing inputs, you don't want them
to be too short, nor too long, because the quality of the program can
only be determined upon inspection of complex content. The quick brown
fox jumps over the lazy dog. Congress shall make no law respecting an
establishment of religion, or prohibiting the free exercise thereof;
or abridging the freedom of speech, or of the press; or the right of
the people peaceably to assemble, and to petition the Government for a
redress of grievances.

Justin's link to Knuth's Breaking Paragraphs Into Lines is the historically best answer. (Newer systems also apply microtypography techniques such as fiddling with character widths, kerning, and so on, but if you're simply looking for monospaced plain-text, these extra approaches won't help.)

If you just want to solve the problem, the fmt(1) utility supplied on many Linux systems by the Free Software Foundation implements a variant of Knuth's algorithm that also attempts to avoid line breaks at the end of sentences. I wrote your inputs and a larger example, and ran them through fmt -w 20 to force 20-character lines:

$ fmt -w 20 input 
Lorem ipsum dolor
sit amet

Supercalifragilisticexpialidocious
and some other
small words

One long
extra-long-word

This extra-long
paragraph
was writtin to
demonstrate how the
`fmt(1)` program
handles longer
inputs. When
testing inputs,
you don't want them
to be too short,
nor too long,
because the quality
of the program can
only be determined
upon inspection
of complex
content. The quick
brown fox jumps
over the lazy
dog. Congress
shall make no
law respecting
an establishment
of religion, or
prohibiting the
free exercise
thereof; or
abridging the
freedom of speech,
or of the press;
or the right of the
people peaceably
to assemble,
and to petition
the Government
for a redress of
grievances.

The output looks much better if you allow it the default 75 characters width for non-trivial input:

$ fmt input 
Lorem ipsum dolor sit amet

Supercalifragilisticexpialidocious and some other small words

One long extra-long-word

This extra-long paragraph was writtin to demonstrate how the `fmt(1)`
program handles longer inputs. When testing inputs, you don't want them
to be too short, nor too long, because the quality of the program can
only be determined upon inspection of complex content. The quick brown
fox jumps over the lazy dog. Congress shall make no law respecting an
establishment of religion, or prohibiting the free exercise thereof;
or abridging the freedom of speech, or of the press; or the right of
the people peaceably to assemble, and to petition the Government for a
redress of grievances.
冷了相思 2025-01-08 00:39:22

这是一个 bash 版本:

#! /bin/sh
if ! [[ "$1" =~ ^[0-9]+$ ]] ; then
    echo "Usage: balance <width> [ <string> ]"
    echo " "
    echo "  if string is not passed as parameter it will be read from STDIN\n"
    exit 2
elif [ $# -le 1 ] ; then
    LINE=`cat`
else
    LINE="$2"
fi

LINES=`echo "$LINE" | fold -s -w $1 | wc -l`
MAX=$1
MIN=0

while [ $MAX -gt $(($MIN+1)) ]
do
    TRY=$(( $MAX + $MIN >> 1 ))
    NUM=`echo "$LINE" | fold -s -w $TRY | wc -l`
    if [ $NUM -le $LINES ] ; then
        MAX=$TRY
    else
        MIN=$TRY
    fi
done

echo "$LINE" | fold -s -w $MAX

示例:

$ balance 50 "Now is the time for all good men to come to the aid of the party."
Now is the time for all good men 
to come to the aid of the party.

需要 'fold' 和 'wc',它们通常在安装 bash 的地方可用。

Here is a bash version:

#! /bin/sh
if ! [[ "$1" =~ ^[0-9]+$ ]] ; then
    echo "Usage: balance <width> [ <string> ]"
    echo " "
    echo "  if string is not passed as parameter it will be read from STDIN\n"
    exit 2
elif [ $# -le 1 ] ; then
    LINE=`cat`
else
    LINE="$2"
fi

LINES=`echo "$LINE" | fold -s -w $1 | wc -l`
MAX=$1
MIN=0

while [ $MAX -gt $(($MIN+1)) ]
do
    TRY=$(( $MAX + $MIN >> 1 ))
    NUM=`echo "$LINE" | fold -s -w $TRY | wc -l`
    if [ $NUM -le $LINES ] ; then
        MAX=$TRY
    else
        MIN=$TRY
    fi
done

echo "$LINE" | fold -s -w $MAX

example:

$ balance 50 "Now is the time for all good men to come to the aid of the party."
Now is the time for all good men 
to come to the aid of the party.

Requires 'fold' and 'wc' which are usually available where bash is installed.

最美的太阳 2025-01-08 00:39:22

有趣的问题。 这是 ac# 版本。我需要这个用于手机游戏的 UI 文本框。找不到任何东西,所以我自己推出了。我对这个问题的复杂性感到惊讶,所以发布这篇文章希望如果您需要将一些经过测试的 C# 代码放入现有项目中,它可以为您节省一些时间。

例子:

int maxLineLength = 75;
var input = @"This extra-long paragraph was writtin to demonstrate how the `fmt(1)` program handles longer inputs. When testing inputs, you don't want them  be too short, nor too long, because the quality of the program can only be determined upon inspection of complex content. The quick brown fox jumps over the lazy dog. Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the Government for a redress of grievances.";
int minLineCount = input.Length / maxLineLength == 0 ? 1 : input.Length / maxLineLength;
int maxLineCount = 2 * minLineCount;
var result = UniformLineSplitting.Split(
    input, maxLineLength, minLineCount, maxLineCount, UniformLineSplitting.Western);
Assert.AreEqual(
@"This extra-long paragraph was writtin to demonstrate how the `fmt(1)`
program handles longer inputs. When testing inputs, you don't want them
be too short, nor too long, because the quality of the program can only be
determined upon inspection of complex content. The quick brown fox jumps
over the lazy dog. Congress shall make no law respecting an establishment
of religion, or prohibiting the free exercise thereof; or abridging the
freedom of speech, or of the press; or the right of the people peaceably
to assemble, and to petition the Government for a redress of grievances.",
    result);

Fun problem. Here's a c# version. I needed this for a mobile game's UI textbox. Couldn't find anything so I rolled my own. I was surprised by the complexity of this problem so posting this hoping it will save you some time if you need to drop some tested c# code into an existing project.

Example:

int maxLineLength = 75;
var input = @"This extra-long paragraph was writtin to demonstrate how the `fmt(1)` program handles longer inputs. When testing inputs, you don't want them  be too short, nor too long, because the quality of the program can only be determined upon inspection of complex content. The quick brown fox jumps over the lazy dog. Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the Government for a redress of grievances.";
int minLineCount = input.Length / maxLineLength == 0 ? 1 : input.Length / maxLineLength;
int maxLineCount = 2 * minLineCount;
var result = UniformLineSplitting.Split(
    input, maxLineLength, minLineCount, maxLineCount, UniformLineSplitting.Western);
Assert.AreEqual(
@"This extra-long paragraph was writtin to demonstrate how the `fmt(1)`
program handles longer inputs. When testing inputs, you don't want them
be too short, nor too long, because the quality of the program can only be
determined upon inspection of complex content. The quick brown fox jumps
over the lazy dog. Congress shall make no law respecting an establishment
of religion, or prohibiting the free exercise thereof; or abridging the
freedom of speech, or of the press; or the right of the people peaceably
to assemble, and to petition the Government for a redress of grievances.",
    result);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文