在 C/C++ 中将字符串格式化为特定长度的多行
是否有一个通用的 C/C++ 库(或通用技术)来获取一行输入文本并将单词分成单独的行。每行输出都有最大宽度,并且单词不会跨行分割。折叠或保留空白是可以的。标点符号必须保留。小型紧凑的库是首选。
我可以轻松地花一个下午的时间来组装一些有用的东西,但想知道是否有一些共同的东西,这样我就不会重新发明轮子。如果输入行可以包含格式说明符来指示输出行的缩进级别,则会加分。
输入示例: “小腿鸡腿咸牛肉、查克火鸡、鸡肉、猪排、鹿肉牛肉条、牛牛排、牛香肠。尾短腰肩球尖、下巴鼓槌臀部。尾舌球尖肉饼、布雷绍拉短腰肉、三尖肥背猪腰肉、沙朗牛排、小腿肉干肉。鹿肉短腰肉 andouille
输出示例(目标宽度 = 60)
123456789012345678901234567890123456789012345678901234567890 Line added to show where 60 is
Shankle drumstick corned beef, chuck turkey chicken pork
chop venison beef strip steak cow sausage. Tail short loin
shoulder ball tip, jowl drumstick rump. Tail tongue ball tip
meatloaf, bresaola short loin tri-tip fatback pork loin
sirloin shank flank biltong. Venison short loin andouille.
Is there a common C/C++ library (or common technique) for taking a line(s) of input text and splitting the words into separate lines. Where each line of output has a max width and words are not split across lines. Whitespace being collapsed or preserved is ok. Punctuation must be preserved. Small and compact library is preferred.
I could easily spend an afternoon putting something together that works, but would like to know if there is something common out there so I don't re-invent the wheel. Bonus points if the input line can contain a format specifier to indicate an indention level for the output lines.
Example input:
"Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.
Example output (target width = 60)
123456789012345678901234567890123456789012345678901234567890 Line added to show where 60 is
Shankle drumstick corned beef, chuck turkey chicken pork
chop venison beef strip steak cow sausage. Tail short loin
shoulder ball tip, jowl drumstick rump. Tail tongue ball tip
meatloaf, bresaola short loin tri-tip fatback pork loin
sirloin shank flank biltong. Venison short loin andouille.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我想您可能正在寻找的是:
I think what you may be looking for is:
这是一个小功能,您可以用它做您想做的事情。它返回行的
列表
。如果需要,您可以通过using namespace std;
或更好的using std::list; 删除所有
但我不想假设你这样做了。std::
使用 std::string ; using std::size_t;该程序使用该函数:
打印示例文本:
Here is a small function with which you can do what you want. It returns a
list
of the lines. You can remove all of thestd::
if you want byusing namespace std;
or betterusing std::list; using std::string; using std::size_t;
but I didn't want to assume you did.This program using that function:
Prints your example text:
如果你想用 C 语言完成这项工作,你可以尝试 w_wrap.c 和 w_wrap.h< /a> 我大约 20 年前发布到 Fidonet C_ECHO 上。
如果你想在 C++ 中完成这项工作,似乎你可以稍微简化一下代码:
要添加缩进,你可以使用类似的东西:
鉴于你正在做 I/O,它可能并不重要这种情况下,但如果您在其他情况下这样做,您可能需要考虑不同的算法。您可以直接转到输入中的最大行宽,然后从那里向后遍历输入字符串,直到找到空格,而不是一次复制一个单词直到超过指定宽度。至少在给定典型字长的情况下,您平均只会向后走大约 3 个字符,而不是向前走平均(比如说)60 个字符。这对于使用 C 字符串之类的东西尤其重要,在这种情况下,您可以存储指向每行开头的指针,而不复制内容。
If you want to do the job in C, you could try the w_wrap.c and w_wrap.h that I posted to Fidonet C_ECHO 20 years ago or so.
If you want to do the job in C++, it seems like you could simplify the code a bit:
To add indentation, you'd use something like:
Given that you're doing I/O, it probably doesn't matter much in this case, but if you were doing this under other circumstances, you might want to consider a different algorithm. Rather than copy one word at a time until you exceed the specified width, you can go directly to the maximum line width in the input, and walk backwards through the input string from there until you find whitespace. At least given typical word lengths, you'll only walk back somewhere around 3 characters on average, rather than walking forward through an average of (say) 60 characters. This would be particularly relevant using something like C strings, where you were storing a pointer to the beginning of each line, without copying the content.
这是我的方法,它当然不是最快的,但我试图使其尽可能可读。结果和你的例子是一样的。
Here's my approach, it's certainly not the fastest but I tried to make it as readable as possible. The result is the same as your example.
是的,将其加载到字符数组中,然后使用 strtok,将其分解为单词,使用空格作为单词分隔符。
Ya, load it into a character array, then use strtok, to break it into words, using a space as the word seperator.
为您的工作采用一个函数,例如:
strtok 会破坏您的字符串,但此解决方案不会。此函数还适用于所有空格,而不仅仅是空格/制表符。
take a function for your work like:
strtok will destroy your string, this solution not. This function will also work on all whitespaces not only space/tab.
您可能可以使用正则表达式替换:替换
/(.*){,60}? +/
和$1\n
,前进字符串指针并重复(注意:?
应该表示非贪婪匹配)。如果实施得当,甚至可以就地进行转换。
You could probably use regex substitution: replace
/(.*){,60}? +/
with$1\n
, advance the string pointer and repeat (note: the?
is supposed to mean non-greedy matching).If properly implemented, the conversion could be even made in-place.
这是一种基于正则表达式的方法。与其他答案中的方法不同,它还可以优雅地处理输入字符串中的换行符。
Here is a regex-based approach. Different from the approaches in other answers, it also handles newlines in the input string gracefully.