将大文本字符串拆分为可变长度字符串，而不破坏单词并保留换行符和空格

发布于 2024-09-01 12:49:25 字数 439 浏览 4 评论 0原文

我试图将一大串文本分成几个较小的文本字符串，并将每个较小的文本字符串的最大长度定义为不同。例如：

"The quick brown fox jumped over the red fence.
       The blue dog dug under the fence."

我想要有代码可以将其分成更小的行，并且第一行最多有 5 个字符，第二行最多有 11 个字符，其余行最多有 20 个字符，结果是：

Line 1: The 
Line 2: quick brown
Line 3: fox jumped over the 
Line 4: red fence.
Line 5:        The blue dog 
Line 6: dug under the fence.

所有这在 C# 或 MSSQL 中可能吗？

原文

I am trying to break a large string of text into several smaller strings of text and define each smaller text strings max length to be different. for example:

"The quick brown fox jumped over the red fence.
       The blue dog dug under the fence."

I would like to have code that can split this into smaller lines and have the first line have a max of 5 characters, the second line have a max of 11, and rest have a max of 20, resulting in this:

Line 1: The 
Line 2: quick brown
Line 3: fox jumped over the 
Line 4: red fence.
Line 5:        The blue dog 
Line 6: dug under the fence.

All this in C# or MSSQL, is it possible?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

入怼 2024-09-08 12:49:25

public List<String> SplitString(String text, int [] lengths)
{
   List<String> output = new List<String>();

   List<String> words = Split(text);

   int i = 0;
   int lineNum = 0;
   string s = string.empty;
   while(i<words.Length)
   {
       if(s.Length+words[i].Length <lengths[lineNum])
       {
            s+=words[i];
            i++;
            if(lineNum<lengths.Length-1)
                 lineNum++;
       }
       else
       {
          output.Add(s);
          s=String.Empty;
       }

   }

    s.Remove(S.length-1,1);// deletes last extra space.

    return output;
}


   public static List<string> Split(string text)
    {
        List<string> result = new List<string>();
        StringBuilder sb = new StringBuilder();

        foreach (var letter in text)
        {
            if (letter != ' ' && letter != '\t' && letter != '\n')
            {
                sb.Append(letter);
            }
            else
            {
                if (sb.Length > 0)
                {

                    result.Add(sb.ToString());
                }

                result.Add(letter.ToString());
                sb = new StringBuilder();
            }
        }

        return result;
    }

这是未经测试/编译的代码，但您应该明白。

我还认为你应该使用 StringBuilder 来代替，但我不记得如何使用它。

public List<String> SplitString(String text, int [] lengths)
{
   List<String> output = new List<String>();

   List<String> words = Split(text);

   int i = 0;
   int lineNum = 0;
   string s = string.empty;
   while(i<words.Length)
   {
       if(s.Length+words[i].Length <lengths[lineNum])
       {
            s+=words[i];
            i++;
            if(lineNum<lengths.Length-1)
                 lineNum++;
       }
       else
       {
          output.Add(s);
          s=String.Empty;
       }

   }

    s.Remove(S.length-1,1);// deletes last extra space.

    return output;
}


   public static List<string> Split(string text)
    {
        List<string> result = new List<string>();
        StringBuilder sb = new StringBuilder();

        foreach (var letter in text)
        {
            if (letter != ' ' && letter != '\t' && letter != '\n')
            {
                sb.Append(letter);
            }
            else
            {
                if (sb.Length > 0)
                {

                    result.Add(sb.ToString());
                }

                result.Add(letter.ToString());
                sb = new StringBuilder();
            }
        }

        return result;
    }

This is untested/compiled code, but you should get the idea.

I also think you should use a StringBuilder instead, but I didn't remember how to use it.

回复收藏 0 原文

揪着可爱 2024-09-08 12:49:25

\A(.{0,5}\b)(.{0,11}\b)(.{0,20}\b)+\Z

将捕获第 1 组中最多 5 个字符，第 2 组中最多 11 个字符，第 3 组中最多 20 个字符。匹配将沿单词分隔符拆分，以避免在单词中间拆分。空格、换行符等都算作字符并将被保留。

诀窍是获取重复组中的各个匹配项，这只能在 .NET 和 Perl 6 中完成：

Match matchResults = null;
Regex paragraphs = new Regex(@"\A(.{0,5}\b)(.{0,11}\b)(.{0,20}\b)+\Z", RegexOptions.Singleline);
matchResults = paragraphs.Match(subjectString);
if (matchResults.Success) {
    String line1 = matchResults.Groups[1].Value;
    String line2 = matchResults.Groups[2].Value;
    Capture line3andup = matchResults.Groups[3].Captures;
    // you now need to iterate over line3andup, extracting the lines.
} else {
    // Match attempt failed
}

我根本不懂 C#，并尝试从 RegexBuddy 的模板和此处为 VB 代码，请随时指出指出我的编码错误。

请注意，第二行开头的空格是在上一个匹配的末尾捕获的。

\A(.{0,5}\b)(.{0,11}\b)(.{0,20}\b)+\Z

will capture up to five characters in group 1, up to 11 in group 2 and chunks of up to 20 in group 3. Matches will be split along word delimiters in order to avoid splitting in the middle of a word. Whitespace, line break etc. count as characters and will be preserved.

The trick is to get at the individual matches in the repeated group, something that can only be done in .NET and Perl 6:

Match matchResults = null;
Regex paragraphs = new Regex(@"\A(.{0,5}\b)(.{0,11}\b)(.{0,20}\b)+\Z", RegexOptions.Singleline);
matchResults = paragraphs.Match(subjectString);
if (matchResults.Success) {
    String line1 = matchResults.Groups[1].Value;
    String line2 = matchResults.Groups[2].Value;
    Capture line3andup = matchResults.Groups[3].Captures;
    // you now need to iterate over line3andup, extracting the lines.
} else {
    // Match attempt failed
}

I don't know C# at all and have tried to construct this from RegexBuddy's templates and the VB code here, so please feel free to point out my coding errors.

Note that the whitespace at the beginning of line two is captured at the end of the previous match.

回复收藏 0 原文

~没有更多了~