Function WordWrap (inputString, width)
Trim the input string of leading and trailing spaces.
If the trimmed string's length is <= the width,
Return the trimmed string.
Else,
Find the index of the last space in the trimmed string, starting at width
If there are no spaces, use the width as the index.
Split the trimmed string into two pieces at the index.
Trim trailing spaces from the portion before the index,
and leading spaces from the portion after the index.
Concatenate and return:
the trimmed portion before the index,
a line break,
and the result of calling WordWrap on the trimmed portion after
the index (with the same width as the original call).
I had occasion to write a word wrap function recently, and I want to share what I came up with.
I used a TDD approach almost as strict as the one from the Go example. I started with the test that wrapping the string "Hello, world!" at 80 width should return "Hello, World!". Clearly, the simplest thing that works is to return the input string untouched. Starting from that, I made more and more complex tests and ended up with a recursive solution that (at least for my purposes) quite efficiently handles the task.
Pseudocode for the recursive solution:
Function WordWrap (inputString, width)
Trim the input string of leading and trailing spaces.
If the trimmed string's length is <= the width,
Return the trimmed string.
Else,
Find the index of the last space in the trimmed string, starting at width
If there are no spaces, use the width as the index.
Split the trimmed string into two pieces at the index.
Trim trailing spaces from the portion before the index,
and leading spaces from the portion after the index.
Concatenate and return:
the trimmed portion before the index,
a line break,
and the result of calling WordWrap on the trimmed portion after
the index (with the same width as the original call).
This only wraps at spaces, and if you want to wrap a string that already contains line breaks, you need to split it at the line breaks, send each piece to this function and then reassemble the string. Even so, in VB.NET running on a fast machine, this can handle about 20 MB/second.
Donald E. Knuth did a lot of work on the line breaking algorithm in his TeX typesetting system. This is arguably one of the best algorithms for line breaking - "best" in terms of visual appearance of result.
His algorithm avoids the problems of greedy line filling where you can end up with a very dense line followed by a very loose line.
An efficient algorithm can be implemented using dynamic programming.
I wondered about the same thing for my own editor project. My solution was a two-step process:
Find the line ends and store them in an array.
For very long lines, find suitable break points at roughly 1K intervals and save them in the line array, too. This is to catch the "4 MB text without a single line break".
When you need to display the text, find the lines in question and wrap them on the fly. Remember this information in a cache for quick redraw. When the user scrolls a whole page, flush the cache and repeat.
If you can, do loading/analyzing of the whole text in a background thread. This way, you can already display the first page of text while the rest of the document is still being examined. The most simple solution here is to cut the first 16 KB of text away and run the algorithm on the substring. This is very fast and allows you to render the first page instantly, even if your editor is still loading the text.
You can use a similar approach when the cursor is initially at the end of the text; just read the last 16 KB of text and analyze that. In this case, use two edit buffers and load all but the last 16 KB into the first while the user is locked into the second buffer. And you'll probably want to remember how many lines the text has when you close the editor, so the scroll bar doesn't look weird.
It gets hairy when the user can start the editor with the cursor somewhere in the middle, but ultimately it's only an extension of the end-problem. Only you need to remember the byte position, the current line number, and the total number of lines from the last session, plus you need three edit buffers or you need an edit buffer where you can cut away 16 KB in the middle.
Alternatively, lock the scrollbar and other interface elements while the text is loading; that allows the user to look at the text while it loads completely.
I don't know of any specific algorithms, but the following could be a rough outline of how it should work:
For the current text size, font, display size, window size, margins, etc., determine how many characters can fit on a line (if fixed-type), or how many pixels can fit on a line (if not fixed-type).
Go through the line character by character, calculating how many characters or pixels have been recorded since the beginning of the line.
When you go over the maximum characters/pixels for the line, move back to the last space/punctuation mark, and move all text to the next line.
Repeat until you go through all text in the document.
In .NET, word wrapping functionality is built into controls like TextBox. I am sure that a similar built-in functionality exists for other languages as well.
Without it's easy. Just encapsulate your text as wordobjects per word and give them a method getWidth(). Then start at the first word adding up the rowlength until it is greater than the available space. If so, wrap the last word and start counting again for the next row starting with this one, etc.
With hyphenation you need hyphenation rules in a common format like: hy-phen-a-tion
Then it's the same as the above except you need to split the last word which has caused the overflow.
A good example and tutorial of how to structure your code for an excellent text editor is given in the Gang of Four Design Patterns book. It's one of the main samples on which they show the patterns.
Here is mine that I was working on today for fun in C:
Here are my considerations:
No copying of characters, just printing to standard output. Therefore, since I don't like to modify the argv[x] arguments, and because I like a challenge, I wanted to do it without modifying it. I did not go for the idea of inserting '\n'.
I don't want
This line breaks here
to become
This line breaks
here
so changing characters to '\n' is not an option given this objective.
If the linewidth is set at say 80, and the 80th character is in the middle of a word, the entire word must be put on the next line. So as you're scanning, you have to remember the position of the end of the last word that didn't go over 80 characters.
So here is mine, it's not clean; I've been breaking my head for the past hour trying to get it to work, adding something here and there. It works for all edge cases that I know of.
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
int isDelim(char c){
switch(c){
case '\0':
case '\t':
case ' ' :
return 1;
break; /* As a matter of style, put the 'break' anyway even if there is a return above it.*/
default:
return 0;
}
}
int printLine(const char * start, const char * end){
const char * p = start;
while ( p <= end )
putchar(*p++);
putchar('\n');
}
int main ( int argc , char ** argv ) {
if( argc <= 2 )
exit(1);
char * start = argv[1];
char * lastChar = argv[1];
char * current = argv[1];
int wrapLength = atoi(argv[2]);
int chars = 1;
while( *current != '\0' ){
while( chars <= wrapLength ){
while ( !isDelim( *current ) ) ++current, ++chars;
if( chars <= wrapLength){
if(*current == '\0'){
puts(start);
return 0;
}
lastChar = current-1;
current++,chars++;
}
}
if( lastChar == start )
lastChar = current-1;
printLine(start,lastChar);
current = lastChar + 1;
while(isDelim(*current)){
if( *current == '\0')
return 0;
else
++current;
}
start = current;
lastChar = current;
chars = 1;
}
return 0;
}
So basically, I have start and lastChar that I want to set as the start of a line and the last character of a line. When those are set, I output to standard output all the characters from start to end, then output a '\n', and move on to the next line.
Initially everything points to the start, then I skip words with the while(!isDelim(*current)) ++current,++chars;. As I do that, I remember the last character that was before 80 chars (lastChar).
If, at the end of a word, I have passed my number of chars (80), then I get out of the while(chars <= wrapLength) block. I output all the characters between start and lastChar and a newline.
Then I set current to lastChar+1 and skip delimiters (and if that leads me to the end of the string, we're done, return 0). Set start, lastChar and current to the start of the next line.
The
if(*current == '\0'){
puts(start);
return 0;
}
part is for strings that are too short to be wrapped even once. I added this just before writing this post because I tried a short string and it didn't work.
I feel like this might be doable in a more elegant way. If anyone has anything to suggest I'd love to try it.
And as I wrote this I asked myself "what's going to happen if I have a string that is one word that is longer than my wraplength" Well it doesn't work. So I added the
if( lastChar == start )
lastChar = current-1;
before the printLine() statement (if lastChar hasn't moved, then we have a word that is too long for a single line so we just have to put the whole thing on the line anyway).
I took the comments out of the code since I'm writing this but I really feel that there must be a better way of doing this than what I have that wouldn't need comments.
So that's the story of how I wrote this thing. I hope it can be of use to people and I also hope that someone will be unsatisfied with my code and propose a more elegant way of doing it.
It should be noted that it works for all edge cases: words too long for a line, strings that are shorter than one wrapLength, and empty strings.
public static void WordWrap(this StringBuilder sb, int tabSize, int width)
{
string[] lines = sb.ToString().Replace("\r\n", "\n").Split('\n');
sb.Clear();
for (int i = 0; i < lines.Length; ++i)
{
var line = lines[i];
if (line.Length < 1)
sb.AppendLine();//empty lines
else
{
int indent = line.TakeWhile(c => c == '\t').Count(); //tab indents
line = line.Replace("\t", new String(' ', tabSize)); //need to expand tabs here
string lead = new String(' ', indent * tabSize); //create the leading space
do
{
//get the string that fits in the window
string subline = line.Substring(0, Math.Min(line.Length, width));
if (subline.Length < line.Length && subline.Length > 0)
{
//grab the last non white character
int lastword = subline.LastOrDefault() == ' ' ? -1 : subline.LastIndexOf(' ', subline.Length - 1);
if (lastword >= 0)
subline = subline.Substring(0, lastword);
sb.AppendLine(subline);
//next part
line = lead + line.Substring(subline.Length).TrimStart();
}
else
{
sb.AppendLine(subline); //everything fits
break;
}
}
while (true);
}
}
}
I cant claim the bug-free-ness of this, but I needed one that word wrapped and obeyed boundaries of indentation. I claim nothing about this code other than it has worked for me so far. This is an extension method and violates the integrity of the StringBuilder but it could be made with whatever inputs / outputs you desire.
public static void WordWrap(this StringBuilder sb, int tabSize, int width)
{
string[] lines = sb.ToString().Replace("\r\n", "\n").Split('\n');
sb.Clear();
for (int i = 0; i < lines.Length; ++i)
{
var line = lines[i];
if (line.Length < 1)
sb.AppendLine();//empty lines
else
{
int indent = line.TakeWhile(c => c == '\t').Count(); //tab indents
line = line.Replace("\t", new String(' ', tabSize)); //need to expand tabs here
string lead = new String(' ', indent * tabSize); //create the leading space
do
{
//get the string that fits in the window
string subline = line.Substring(0, Math.Min(line.Length, width));
if (subline.Length < line.Length && subline.Length > 0)
{
//grab the last non white character
int lastword = subline.LastOrDefault() == ' ' ? -1 : subline.LastIndexOf(' ', subline.Length - 1);
if (lastword >= 0)
subline = subline.Substring(0, lastword);
sb.AppendLine(subline);
//next part
line = lead + line.Substring(subline.Length).TrimStart();
}
else
{
sb.AppendLine(subline); //everything fits
break;
}
}
while (true);
}
}
}
#!/usr/bin/perl
use strict;
use warnings;
my $WIDTH = 80;
if ($ARGV[0] =~ /^[1-9][0-9]*$/) {
$WIDTH = $ARGV[0];
shift @ARGV;
}
while (<>) {
s/\r\n$/\n/;
chomp;
if (length $_ <= $WIDTH) {
print "$_\n";
next;
}
@_=split /(\s+)/;
# make @_ start with a separator field and end with a content field
unshift @_, "";
push @_, "" if @_%2;
my ($sep,$cont) = splice(@_, 0, 2);
do {
if (length $cont > $WIDTH) {
print "$cont";
($sep,$cont) = splice(@_, 0, 2);
}
elsif (length($sep) + length($cont) > $WIDTH) {
printf "%*s%s", $WIDTH - length $cont, "", $cont;
($sep,$cont) = splice(@_, 0, 2);
}
else {
my $remain = $WIDTH;
{ do {
print "$sep$cont";
$remain -= length $sep;
$remain -= length $cont;
($sep,$cont) = splice(@_, 0, 2) or last;
}
while (length($sep) + length($cont) <= $remain);
}
}
print "\n";
$sep = "";
}
while ($cont);
}
I may as well chime in with a perl solution that I made, because gnu fold -s was leaving trailing spaces and other bad behavior. This solution does not (properly) handle text containing tabs or backspaces or embedded carriage returns or the like, although it does handle CRLF line-endings, converting them all to just LF. It makes minimal change to the text, in particular it never splits a word (doesn't change wc -w), and for text with no more than single space in a row (and no CR) it doesn't change wc -c (because it replaces space with LF rather than inserting LF).
#!/usr/bin/perl
use strict;
use warnings;
my $WIDTH = 80;
if ($ARGV[0] =~ /^[1-9][0-9]*$/) {
$WIDTH = $ARGV[0];
shift @ARGV;
}
while (<>) {
s/\r\n$/\n/;
chomp;
if (length $_ <= $WIDTH) {
print "$_\n";
next;
}
@_=split /(\s+)/;
# make @_ start with a separator field and end with a content field
unshift @_, "";
push @_, "" if @_%2;
my ($sep,$cont) = splice(@_, 0, 2);
do {
if (length $cont > $WIDTH) {
print "$cont";
($sep,$cont) = splice(@_, 0, 2);
}
elsif (length($sep) + length($cont) > $WIDTH) {
printf "%*s%s", $WIDTH - length $cont, "", $cont;
($sep,$cont) = splice(@_, 0, 2);
}
else {
my $remain = $WIDTH;
{ do {
print "$sep$cont";
$remain -= length $sep;
$remain -= length $cont;
($sep,$cont) = splice(@_, 0, 2) or last;
}
while (length($sep) + length($cont) <= $remain);
}
}
print "\n";
$sep = "";
}
while ($cont);
}
I did not succeed using it, but I came up with another solution. If there is any interest in this, please feel free to use this: WordWrap function in C#. The source is available on GitHub.
static char[] splitChars = new char[] { ' ', '-', '\t' };
private static string WordWrap(string str, int width)
{
string[] words = Explode(str, splitChars);
int curLineLength = 0;
StringBuilder strBuilder = new StringBuilder();
for(int i = 0; i < words.Length; i += 1)
{
string word = words[i];
// If adding the new word to the current line would be too long,
// then put it on a new line (and split it up if it's too long).
if (curLineLength + word.Length > width)
{
// Only move down to a new line if we have text on the current line.
// Avoids situation where
// wrapped whitespace causes emptylines in text.
if (curLineLength > 0)
{
strBuilder.Append(Environment.NewLine);
curLineLength = 0;
}
// If the current word is too long
// to fit on a line (even on its own),
// then split the word up.
while (word.Length > width)
{
strBuilder.Append(word.Substring(0, width - 1) + "-");
word = word.Substring(width - 1);
strBuilder.Append(Environment.NewLine);
}
// Remove leading whitespace from the word,
// so the new line starts flush to the left.
word = word.TrimStart();
}
strBuilder.Append(word);
curLineLength += word.Length;
}
return strBuilder.ToString();
}
private static string[] Explode(string str, char[] splitChars)
{
List<string> parts = new List<string>();
int startIndex = 0;
while (true)
{
int index = str.IndexOfAny(splitChars, startIndex);
if (index == -1)
{
parts.Add(str.Substring(startIndex));
return parts.ToArray();
}
string word = str.Substring(startIndex, index - startIndex);
char nextChar = str.Substring(index, 1)[0];
// Dashes and the like should stick to the word occuring before it.
// Whitespace doesn't have to.
if (char.IsWhiteSpace(nextChar))
{
parts.Add(word);
parts.Add(nextChar.ToString());
}
else
{
parts.Add(word + nextChar);
}
startIndex = index + 1;
}
}
Here is a word-wrap algorithm I've written in C#. It should be fairly easy to translate into other languages (except perhaps for IndexOfAny).
static char[] splitChars = new char[] { ' ', '-', '\t' };
private static string WordWrap(string str, int width)
{
string[] words = Explode(str, splitChars);
int curLineLength = 0;
StringBuilder strBuilder = new StringBuilder();
for(int i = 0; i < words.Length; i += 1)
{
string word = words[i];
// If adding the new word to the current line would be too long,
// then put it on a new line (and split it up if it's too long).
if (curLineLength + word.Length > width)
{
// Only move down to a new line if we have text on the current line.
// Avoids situation where
// wrapped whitespace causes emptylines in text.
if (curLineLength > 0)
{
strBuilder.Append(Environment.NewLine);
curLineLength = 0;
}
// If the current word is too long
// to fit on a line (even on its own),
// then split the word up.
while (word.Length > width)
{
strBuilder.Append(word.Substring(0, width - 1) + "-");
word = word.Substring(width - 1);
strBuilder.Append(Environment.NewLine);
}
// Remove leading whitespace from the word,
// so the new line starts flush to the left.
word = word.TrimStart();
}
strBuilder.Append(word);
curLineLength += word.Length;
}
return strBuilder.ToString();
}
private static string[] Explode(string str, char[] splitChars)
{
List<string> parts = new List<string>();
int startIndex = 0;
while (true)
{
int index = str.IndexOfAny(splitChars, startIndex);
if (index == -1)
{
parts.Add(str.Substring(startIndex));
return parts.ToArray();
}
string word = str.Substring(startIndex, index - startIndex);
char nextChar = str.Substring(index, 1)[0];
// Dashes and the like should stick to the word occuring before it.
// Whitespace doesn't have to.
if (char.IsWhiteSpace(nextChar))
{
parts.Add(word);
parts.Add(nextChar.ToString());
}
else
{
parts.Add(word + nextChar);
}
startIndex = index + 1;
}
}
It's fairly primitive - it splits on spaces, tabs and dashes.
It does make sure that dashes stick to the word before it (so you don't end up with "stack -overflow"), though it doesn't favour moving small hyphenated words to a new line rather than splitting them.
It does split up words if they are too long for a line.
It's also fairly culturally specific, as I don't know much about the word-wrapping rules of other cultures.
发布评论
评论(10)
最近我有机会写一个自动换行功能,我想分享一下我的想法。
我使用了一种 TDD 方法,几乎与 示例。 我从包装字符串“Hello, world!”的测试开始。 宽度为 80 时应返回“Hello, World!”。 显然,最简单的方法就是原封不动地返回输入字符串。 从那时起,我进行了越来越复杂的测试,并最终得到了一个递归解决方案,该解决方案(至少对于我的目的)非常有效地处理任务。
递归解决方案的伪代码:
这仅在空格处换行,如果要换行已经包含换行符的字符串,则需要在换行符处将其拆分,将每个部分发送到此函数,然后重新组装字符串。 即便如此,在快速机器上运行的 VB.NET 中,每秒可以处理大约 20 MB。
I had occasion to write a word wrap function recently, and I want to share what I came up with.
I used a TDD approach almost as strict as the one from the Go example. I started with the test that wrapping the string "Hello, world!" at 80 width should return "Hello, World!". Clearly, the simplest thing that works is to return the input string untouched. Starting from that, I made more and more complex tests and ended up with a recursive solution that (at least for my purposes) quite efficiently handles the task.
Pseudocode for the recursive solution:
This only wraps at spaces, and if you want to wrap a string that already contains line breaks, you need to split it at the line breaks, send each piece to this function and then reassemble the string. Even so, in VB.NET running on a fast machine, this can handle about 20 MB/second.
Donald E. Knuth 在他的 TeX 排版系统中的换行算法上做了很多工作。 这可以说是最好的换行算法之一——就结果的视觉外观而言是“最佳”。
他的算法避免了贪婪线填充的问题,在这种情况下,您可能会得到一条非常密集的线,然后是一条非常松散的线。
可以使用动态规划来实现有效的算法。
一篇关于 TeX 换行的论文。
Donald E. Knuth did a lot of work on the line breaking algorithm in his TeX typesetting system. This is arguably one of the best algorithms for line breaking - "best" in terms of visual appearance of result.
His algorithm avoids the problems of greedy line filling where you can end up with a very dense line followed by a very loose line.
An efficient algorithm can be implemented using dynamic programming.
A paper on TeX's line breaking.
我对我自己的编辑器项目也有同样的疑问。 我的解决方案是一个两步过程:
当您需要显示文本时,找到有问题的行并即时换行。 将此信息记住在缓存中以便快速重绘。 当用户滚动整个页面时,刷新缓存并重复。
如果可以,请在后台线程中加载/分析整个文本。 这样,您就可以显示第一页文本,同时仍在检查文档的其余部分。 这里最简单的解决方案是切掉前 16 KB 文本并在子字符串上运行算法。 这非常快,即使您的编辑器仍在加载文本,您也可以立即渲染第一页。
当光标最初位于文本末尾时,您可以使用类似的方法; 只需阅读最后 16 KB 的文本并进行分析即可。 在这种情况下,请使用两个编辑缓冲区,并将除最后 16KB 之外的所有内容加载到第一个缓冲区中,同时用户被锁定到第二个缓冲区中。 当您关闭编辑器时,您可能会想记住文本有多少行,这样滚动条看起来就不会很奇怪。
当用户可以将光标放在中间的某个位置启动编辑器时,它会变得很棘手,但最终这只是最终问题的扩展。 只需记住上一个会话的字节位置、当前行号和总行数,再加上您需要三个编辑缓冲区,或者您需要一个可以在中间删除 16 KB 的编辑缓冲区。
或者,在加载文本时锁定滚动条和其他界面元素; 允许用户在文本完全加载时查看文本。
I wondered about the same thing for my own editor project. My solution was a two-step process:
When you need to display the text, find the lines in question and wrap them on the fly. Remember this information in a cache for quick redraw. When the user scrolls a whole page, flush the cache and repeat.
If you can, do loading/analyzing of the whole text in a background thread. This way, you can already display the first page of text while the rest of the document is still being examined. The most simple solution here is to cut the first 16 KB of text away and run the algorithm on the substring. This is very fast and allows you to render the first page instantly, even if your editor is still loading the text.
You can use a similar approach when the cursor is initially at the end of the text; just read the last 16 KB of text and analyze that. In this case, use two edit buffers and load all but the last 16 KB into the first while the user is locked into the second buffer. And you'll probably want to remember how many lines the text has when you close the editor, so the scroll bar doesn't look weird.
It gets hairy when the user can start the editor with the cursor somewhere in the middle, but ultimately it's only an extension of the end-problem. Only you need to remember the byte position, the current line number, and the total number of lines from the last session, plus you need three edit buffers or you need an edit buffer where you can cut away 16 KB in the middle.
Alternatively, lock the scrollbar and other interface elements while the text is loading; that allows the user to look at the text while it loads completely.
我不知道任何具体的算法,但以下可能是它应该如何工作的粗略轮廓:
在 .NET 中,自动换行功能内置于 TextBox 等控件中。 我确信其他语言也存在类似的内置功能。
I don't know of any specific algorithms, but the following could be a rough outline of how it should work:
In .NET, word wrapping functionality is built into controls like TextBox. I am sure that a similar built-in functionality exists for other languages as well.
有或没有连字符?
没有它很容易。 只需将文本封装为每个单词的单词对象,并为它们提供一个方法 getWidth() 即可。 然后从第一个单词开始累加行长度,直到它大于可用空间。 如果是这样,请包装最后一个单词,并开始再次计数从该单词开始的下一行,依此类推。
使用连字符,您需要采用常见格式的连字符规则,例如: hy-phen-a-tion
那么它与上面的相同,除了您需要拆分导致溢出的最后一个单词。
四人帮中给出了一个很好的示例和教程,说明如何构建优秀的文本编辑器的代码
With or without hyphenation?
Without it's easy. Just encapsulate your text as wordobjects per word and give them a method getWidth(). Then start at the first word adding up the rowlength until it is greater than the available space. If so, wrap the last word and start counting again for the next row starting with this one, etc.
With hyphenation you need hyphenation rules in a common format like: hy-phen-a-tion
Then it's the same as the above except you need to split the last word which has caused the overflow.
A good example and tutorial of how to structure your code for an excellent text editor is given in the Gang of Four Design Patterns book. It's one of the main samples on which they show the patterns.
这是我今天在 C 中为了好玩而工作的:
这是我的考虑因素:
不复制字符,只是打印到标准输出。 因此,由于我不喜欢修改 argv[x] 参数,并且因为我喜欢挑战,所以我想在不修改它的情况下完成它。 我并没有想到插入
'\n'
。我不想
成为
因此,鉴于此目标,将字符更改为
'\n'
不是一个选项。如果线宽设置为 80,并且第 80 个字符位于单词的中间,则整个单词必须放在下一行。 因此,当您扫描时,您必须记住最后一个不超过 80 个字符的单词的结尾位置。
所以这是我的,它不干净; 在过去的一个小时里,我一直在绞尽脑汁地试图让它工作,到处添加一些东西。 它适用于我所知道的所有边缘情况。
所以基本上,我想将
start
和lastChar
设置为行的开头和行的最后一个字符。 设置完毕后,我将从头到尾的所有字符输出到标准输出,然后输出'\n'
,然后转到下一行。最初,所有内容都指向开头,然后我使用
while(!isDelim(*current)) ++current,++chars;
跳过单词。 当我这样做时,我记得 80 个字符之前的最后一个字符 (lastChar
)。如果在一个单词的末尾,我已经传递了我的字符数 (80),那么我就会退出
while(chars <= wrapLength)
块。 我输出start
和lastChar
之间的所有字符以及newline
。然后我将
current
设置为lastChar+1
并跳过分隔符(如果这引导我到达字符串的末尾,我们就完成了,return 0
)。 将start
、lastChar
和current
设置为下一行的开头。部分适用于太短而无法包装一次的字符串。 我在写这篇文章之前添加了这个,因为我尝试了一个短字符串,但它不起作用。
我觉得这可能可以以更优雅的方式实现。 如果有人有什么建议,我很乐意尝试。
当我写这篇文章时,我问自己“如果我有一个字符串,其中一个单词比我的包装长度长,会发生什么”好吧,它不起作用。 所以我添加了
在
printLine()
语句之前(如果lastChar
没有移动,那么我们的单词对于单行来说太长了,所以我们只需将无论如何,整个事情都在线)。自从我写这篇文章以来,我从代码中删除了注释,但我真的觉得一定有一种比我不需要注释的方法更好的方法。
这就是我如何写这个东西的故事。 我希望它对人们有用,我也希望有人对我的代码不满意并提出一种更优雅的方法。
应该注意的是,它适用于所有边缘情况:一行中的单词太长、字符串短于一个 wrapLength 以及空字符串。
Here is mine that I was working on today for fun in C:
Here are my considerations:
No copying of characters, just printing to standard output. Therefore, since I don't like to modify the argv[x] arguments, and because I like a challenge, I wanted to do it without modifying it. I did not go for the idea of inserting
'\n'
.I don't want
to become
so changing characters to
'\n'
is not an option given this objective.If the linewidth is set at say 80, and the 80th character is in the middle of a word, the entire word must be put on the next line. So as you're scanning, you have to remember the position of the end of the last word that didn't go over 80 characters.
So here is mine, it's not clean; I've been breaking my head for the past hour trying to get it to work, adding something here and there. It works for all edge cases that I know of.
So basically, I have
start
andlastChar
that I want to set as the start of a line and the last character of a line. When those are set, I output to standard output all the characters from start to end, then output a'\n'
, and move on to the next line.Initially everything points to the start, then I skip words with the
while(!isDelim(*current)) ++current,++chars;
. As I do that, I remember the last character that was before 80 chars (lastChar
).If, at the end of a word, I have passed my number of chars (80), then I get out of the
while(chars <= wrapLength)
block. I output all the characters betweenstart
andlastChar
and anewline
.Then I set
current
tolastChar+1
and skip delimiters (and if that leads me to the end of the string, we're done,return 0
). Setstart
,lastChar
andcurrent
to the start of the next line.The
part is for strings that are too short to be wrapped even once. I added this just before writing this post because I tried a short string and it didn't work.
I feel like this might be doable in a more elegant way. If anyone has anything to suggest I'd love to try it.
And as I wrote this I asked myself "what's going to happen if I have a string that is one word that is longer than my wraplength" Well it doesn't work. So I added the
before the
printLine()
statement (iflastChar
hasn't moved, then we have a word that is too long for a single line so we just have to put the whole thing on the line anyway).I took the comments out of the code since I'm writing this but I really feel that there must be a better way of doing this than what I have that wouldn't need comments.
So that's the story of how I wrote this thing. I hope it can be of use to people and I also hope that someone will be unsatisfied with my code and propose a more elegant way of doing it.
It should be noted that it works for all edge cases: words too long for a line, strings that are shorter than one wrapLength, and empty strings.
我不能声称它没有错误,但我需要一个能够自动换行并遵守缩进边界的文件。 除了到目前为止它对我有用之外,我对这段代码没有任何声明。 这是一种扩展方法,违反了 StringBuilder 的完整性,但它可以用您想要的任何输入/输出来实现。
I cant claim the bug-free-ness of this, but I needed one that word wrapped and obeyed boundaries of indentation. I claim nothing about this code other than it has worked for me so far. This is an extension method and violates the integrity of the StringBuilder but it could be made with whatever inputs / outputs you desire.
我也可以加入我制作的 Perl 解决方案,因为 gnu
fold -s
留下了尾随空格和其他不良行为。 该解决方案不能(正确)处理包含制表符或退格键或嵌入回车符等的文本,尽管它确实处理 CRLF 行结束符,将它们全部转换为 LF。 它对文本的更改最小,特别是它从不拆分单词(不更改wc -w
),并且对于行中不超过一个空格(并且没有 CR)的文本,它不会改变wc -c
(因为它用 LF 替换 空格,而不是插入 LF)。I may as well chime in with a perl solution that I made, because gnu
fold -s
was leaving trailing spaces and other bad behavior. This solution does not (properly) handle text containing tabs or backspaces or embedded carriage returns or the like, although it does handle CRLF line-endings, converting them all to just LF. It makes minimal change to the text, in particular it never splits a word (doesn't changewc -w
), and for text with no more than single space in a row (and no CR) it doesn't changewc -c
(because it replaces space with LF rather than inserting LF).@ICR,感谢分享 C# 示例。
我没有成功使用它,但我想出了另一个解决方案。 如果对此有任何兴趣,请随意使用:
C# 中的 WordWrap 函数。 源代码可在 GitHub 上获取。
我已经包含了单元测试/示例。
@ICR, thanks for sharing the C# example.
I did not succeed using it, but I came up with another solution. If there is any interest in this, please feel free to use this:
WordWrap function in C#. The source is available on GitHub.
I've included unit tests / samples.
这是我用 C# 编写的自动换行算法。
翻译成其他语言应该相当容易(
IndexOfAny
除外)。它相当原始——它按空格、制表符和破折号进行分割。
它确实确保破折号粘在它之前的单词上
(所以你不会最终得到“stack
-溢出”),
虽然它不利于移动小的连字符单词
到一个新行而不是拆分它们。
如果单词对于一行来说太长,它确实会分割单词。
它也具有相当的文化特色,
因为我不太了解其他文化的自动换行规则。
Here is a word-wrap algorithm I've written in C#.
It should be fairly easy to translate into other languages (except perhaps for
IndexOfAny
).It's fairly primitive - it splits on spaces, tabs and dashes.
It does make sure that dashes stick to the word before it
(so you don't end up with "stack
-overflow"),
though it doesn't favour moving small hyphenated words
to a new line rather than splitting them.
It does split up words if they are too long for a line.
It's also fairly culturally specific,
as I don't know much about the word-wrapping rules of other cultures.