在 C/C++ 中将字符串格式化为特定长度的多行

发布于 2024-11-27 03:06:56 字数 761 浏览 1 评论 0原文

是否有一个通用的 C/C++ 库(或通用技术)来获取一行输入文本并将单词分成单独的行。每行输出都有最大宽度,并且单词不会跨行分割。折叠或保留空白是可以的。标点符号必须保留。小型紧凑的库是首选。

我可以轻松地花一个下午的时间来组装一些有用的东西,但想知道是否有一些共同的东西,这样我就不会重新发明轮子。如果输入行可以包含格式说明符来指示输出行的缩进级别,则会加分。

输入示例: “小腿鸡腿咸牛肉、查克火鸡、鸡肉、猪排、鹿肉牛肉条、牛牛排、牛香肠。尾短腰肩球尖、下巴鼓槌臀部。尾舌球尖肉饼、布雷绍拉短腰肉、三尖肥背猪腰肉、沙朗牛排、小腿肉干肉。鹿肉短腰肉 andouille

输出示例(目标宽度 = 60)

123456789012345678901234567890123456789012345678901234567890   Line added to show where 60 is
Shankle drumstick corned beef, chuck turkey chicken pork
chop venison beef strip steak cow sausage. Tail short loin
shoulder ball tip, jowl drumstick rump. Tail tongue ball tip
meatloaf, bresaola short loin tri-tip fatback pork loin
sirloin shank flank biltong. Venison short loin andouille.

Is there a common C/C++ library (or common technique) for taking a line(s) of input text and splitting the words into separate lines. Where each line of output has a max width and words are not split across lines. Whitespace being collapsed or preserved is ok. Punctuation must be preserved. Small and compact library is preferred.

I could easily spend an afternoon putting something together that works, but would like to know if there is something common out there so I don't re-invent the wheel. Bonus points if the input line can contain a format specifier to indicate an indention level for the output lines.

Example input:
"Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.

Example output (target width = 60)

123456789012345678901234567890123456789012345678901234567890   Line added to show where 60 is
Shankle drumstick corned beef, chuck turkey chicken pork
chop venison beef strip steak cow sausage. Tail short loin
shoulder ball tip, jowl drumstick rump. Tail tongue ball tip
meatloaf, bresaola short loin tri-tip fatback pork loin
sirloin shank flank biltong. Venison short loin andouille.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

套路撩心 2024-12-04 03:06:56

我想您可能正在寻找的是:

char temp[60];
int cnt, x = 0;
do
{
    cnt = 59;
    strncpy(temp, src + x, 60); //Assuming the original is stored in src
    while(temp[cnt] != ' ') cnt --;
    temp[cnt] = (char) 0;
    x += cnt + 1;
    printf("%s\n", temp);
}while (x < strlen(src));

I think what you may be looking for is:

char temp[60];
int cnt, x = 0;
do
{
    cnt = 59;
    strncpy(temp, src + x, 60); //Assuming the original is stored in src
    while(temp[cnt] != ' ') cnt --;
    temp[cnt] = (char) 0;
    x += cnt + 1;
    printf("%s\n", temp);
}while (x < strlen(src));
甩你一脸翔 2024-12-04 03:06:56

这是一个小功能,您可以用它做您想做的事情。它返回行的列表。如果需要,您可以通过 using namespace std; 或更好的 using std::list; 删除所有 std::使用 std::string ; using std::size_t; 但我不想假设你这样做了。

list<string> wraptext(string input, size_t width) {
    size_t curpos = 0;
    size_t nextpos = 0;

    list<string> lines;
    string substr = input.substr(curpos, width + 1);

    while (substr.length() == width + 1 && (nextpos = substr.rfind(' ')) != input.npos) {
        lines.push_back(input.substr(curpos, nextpos));
        curpos += nextpos + 1;
        substr = input.substr(curpos, width + 1);
    }

    if (curpos != input.length())
        lines.push_back(input.substr(curpos, input.npos));

    return lines;
}

该程序使用该函数:

int main() {
    string input = "Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.";

    list<string> l = wraptext(input, 60);

    for (auto i = l.begin(); i != l.end(); ++i)
        cout << *i << endl;

    cin.get();
}

打印示例文本:

Shankle drumstick corned beef, chuck turkey chicken pork
chop venison beef strip steak cow sausage. Tail short loin
shoulder ball tip, jowl drumstick rump. Tail tongue ball tip
meatloaf, bresaola short loin tri-tip fatback pork loin
sirloin shank flank biltong. Venison short loin andouille.

Here is a small function with which you can do what you want. It returns a list of the lines. You can remove all of the std:: if you want by using namespace std; or better using std::list; using std::string; using std::size_t; but I didn't want to assume you did.

list<string> wraptext(string input, size_t width) {
    size_t curpos = 0;
    size_t nextpos = 0;

    list<string> lines;
    string substr = input.substr(curpos, width + 1);

    while (substr.length() == width + 1 && (nextpos = substr.rfind(' ')) != input.npos) {
        lines.push_back(input.substr(curpos, nextpos));
        curpos += nextpos + 1;
        substr = input.substr(curpos, width + 1);
    }

    if (curpos != input.length())
        lines.push_back(input.substr(curpos, input.npos));

    return lines;
}

This program using that function:

int main() {
    string input = "Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.";

    list<string> l = wraptext(input, 60);

    for (auto i = l.begin(); i != l.end(); ++i)
        cout << *i << endl;

    cin.get();
}

Prints your example text:

Shankle drumstick corned beef, chuck turkey chicken pork
chop venison beef strip steak cow sausage. Tail short loin
shoulder ball tip, jowl drumstick rump. Tail tongue ball tip
meatloaf, bresaola short loin tri-tip fatback pork loin
sirloin shank flank biltong. Venison short loin andouille.
策马西风 2024-12-04 03:06:56

如果你想用 C 语言完成这项工作,你可以尝试 w_wrap.c 和 w_wrap.h< /a> 我大约 20 年前发布到 Fidonet C_ECHO 上。

如果你想在 C++ 中完成这项工作,似乎你可以稍微简化一下代码:

#include <sstream>
#include <string>
#include <iostream>

void wrap(std::string const &input, size_t width, std::ostream &os, size_t indent = 0)
{ 
    std::istringstream in(input);

    os << std::string(indent, ' '); 
    size_t current = indent;
    std::string word;

    while (in >> word) {
        if (current + word.size() > width) {
            os << "\n" << std::string(indent, ' ');
            current = indent;
        }
        os << word << ' ';
        current += word.size() + 1;
    }
}

#ifdef TEST 
int main() { 
    char *in = "Shankle drumstick corned beef, chuck turkey chicken pork chop"
               " venison beef strip steak cow sausage. Tail short loin shoulder"
               " ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf,"
               " bresaola short loin tri-tip fatback pork loin sirloin shank"
               " flank biltong. Venison short loin andouille.";

    wrap(in, 60, std::cout);
    return 0;
}
#endif

要添加缩进,你可以使用类似的东西:

wrap(in, 60, std::cout, 5);

鉴于你正在做 I/O,它可能并不重要这种情况下,但如果您在其他情况下这样做,您可能需要考虑不同的算法。您可以直接转到输入中的最大行宽,然后从那里向后遍历输入字符串,直到找到空格,而不是一次复制一个单词直到超过指定宽度。至少在给定典型字长的情况下,您平均只会向后走大约 3 个字符,而不是向前走平均(比如说)60 个字符。这对于使用 C 字符串之类的东西尤其重要,在这种情况下,您可以存储指向每行开头的指针,而不复制内容。

If you want to do the job in C, you could try the w_wrap.c and w_wrap.h that I posted to Fidonet C_ECHO 20 years ago or so.

If you want to do the job in C++, it seems like you could simplify the code a bit:

#include <sstream>
#include <string>
#include <iostream>

void wrap(std::string const &input, size_t width, std::ostream &os, size_t indent = 0)
{ 
    std::istringstream in(input);

    os << std::string(indent, ' '); 
    size_t current = indent;
    std::string word;

    while (in >> word) {
        if (current + word.size() > width) {
            os << "\n" << std::string(indent, ' ');
            current = indent;
        }
        os << word << ' ';
        current += word.size() + 1;
    }
}

#ifdef TEST 
int main() { 
    char *in = "Shankle drumstick corned beef, chuck turkey chicken pork chop"
               " venison beef strip steak cow sausage. Tail short loin shoulder"
               " ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf,"
               " bresaola short loin tri-tip fatback pork loin sirloin shank"
               " flank biltong. Venison short loin andouille.";

    wrap(in, 60, std::cout);
    return 0;
}
#endif

To add indentation, you'd use something like:

wrap(in, 60, std::cout, 5);

Given that you're doing I/O, it probably doesn't matter much in this case, but if you were doing this under other circumstances, you might want to consider a different algorithm. Rather than copy one word at a time until you exceed the specified width, you can go directly to the maximum line width in the input, and walk backwards through the input string from there until you find whitespace. At least given typical word lengths, you'll only walk back somewhere around 3 characters on average, rather than walking forward through an average of (say) 60 characters. This would be particularly relevant using something like C strings, where you were storing a pointer to the beginning of each line, without copying the content.

不甘平庸 2024-12-04 03:06:56

这是我的方法,它当然不是最快的,但我试图使其尽可能可读。结果和你的例子是一样的。

#include <iostream>
#include <string>


std::string splitInLines(std::string source, std::size_t width, std::string whitespace = " \t\r")
{
    std::size_t  currIndex = width - 1;
    std::size_t  sizeToElim;
    while ( currIndex < source.length() )
    {
        currIndex = source.find_last_of(whitespace,currIndex + 1); 
        if (currIndex == std::string::npos)
            break;
        currIndex = source.find_last_not_of(whitespace,currIndex);
        if (currIndex == std::string::npos)
            break;
        sizeToElim = source.find_first_not_of(whitespace,currIndex + 1) - currIndex - 1;
        source.replace( currIndex + 1, sizeToElim , "\n");
        currIndex += (width + 1); //due to the recently inserted "\n"
    }
    return source;
}

int main() {
    std::string source = "Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.";
    std::string result = splitInLines(source , 60);
    std::cout << result;
    return 0;
}

Here's my approach, it's certainly not the fastest but I tried to make it as readable as possible. The result is the same as your example.

#include <iostream>
#include <string>


std::string splitInLines(std::string source, std::size_t width, std::string whitespace = " \t\r")
{
    std::size_t  currIndex = width - 1;
    std::size_t  sizeToElim;
    while ( currIndex < source.length() )
    {
        currIndex = source.find_last_of(whitespace,currIndex + 1); 
        if (currIndex == std::string::npos)
            break;
        currIndex = source.find_last_not_of(whitespace,currIndex);
        if (currIndex == std::string::npos)
            break;
        sizeToElim = source.find_first_not_of(whitespace,currIndex + 1) - currIndex - 1;
        source.replace( currIndex + 1, sizeToElim , "\n");
        currIndex += (width + 1); //due to the recently inserted "\n"
    }
    return source;
}

int main() {
    std::string source = "Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.";
    std::string result = splitInLines(source , 60);
    std::cout << result;
    return 0;
}
梦境 2024-12-04 03:06:56

是的,将其加载到字符数组中,然后使用 strtok,将其分解为单词,使用空格作为单词分隔符。

Ya, load it into a character array, then use strtok, to break it into words, using a space as the word seperator.

玩套路吗 2024-12-04 03:06:56

为您的工作采用一个函数,例如:

void put_multiline(const char *s,int width)
{
  int n,i=0;
  char t[100];
  while( 1==sscanf(s,"%99s%n",t,&n) )
  {
    if( i+strlen(t)>width ) puts(""),i=0;
    printf("%s%s",i?++i," ":"",t);i+=strlen(t);
    s+=n;
  }
}

strtok 会破坏您的字符串,但此解决方案不会。此函数还适用于所有空格,而不仅仅是空格/制表符。

take a function for your work like:

void put_multiline(const char *s,int width)
{
  int n,i=0;
  char t[100];
  while( 1==sscanf(s,"%99s%n",t,&n) )
  {
    if( i+strlen(t)>width ) puts(""),i=0;
    printf("%s%s",i?++i," ":"",t);i+=strlen(t);
    s+=n;
  }
}

strtok will destroy your string, this solution not. This function will also work on all whitespaces not only space/tab.

潇烟暮雨 2024-12-04 03:06:56

您可能可以使用正则表达式替换:替换 /(.*){,60}? +/$1\n,前进字符串指针并重复(注意:? 应该表示非贪婪匹配)。

如果实施得当,甚至可以就地进行转换。

You could probably use regex substitution: replace /(.*){,60}? +/ with $1\n, advance the string pointer and repeat (note: the ? is supposed to mean non-greedy matching).

If properly implemented, the conversion could be even made in-place.

不气馁 2024-12-04 03:06:56

这是一种基于正则表达式的方法。与其他答案中的方法不同,它还可以优雅地处理输入字符串中的换行符。

#include <regex>
#include <iostream>
#include <string>

int main() {
  auto test = std::string{"Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille."};

  // Consume 60 characters that are followed by a space or the end of the input string
  auto line_wrap = std::regex{"(.{1,60})(?: +|$)"};

  // Replace the space or the end of the input string with a new line
  test = regex_replace(test, line_wrap, "$1\n");

  // Trim the new line added for the end of the input string
  test.resize(test.size() - 1);

  std::cout << test << std::endl;
}

Here is a regex-based approach. Different from the approaches in other answers, it also handles newlines in the input string gracefully.

#include <regex>
#include <iostream>
#include <string>

int main() {
  auto test = std::string{"Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille."};

  // Consume 60 characters that are followed by a space or the end of the input string
  auto line_wrap = std::regex{"(.{1,60})(?: +|$)"};

  // Replace the space or the end of the input string with a new line
  test = regex_replace(test, line_wrap, "$1\n");

  // Trim the new line added for the end of the input string
  test.resize(test.size() - 1);

  std::cout << test << std::endl;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文