Java - 将字符串分割成字符限制的句子

发布于 2025-01-07 03:36:25 字数 616 浏览 0 评论 0原文

我想将文本拆分为句子(按 . 或 BreakIterator 拆分)。 但是:每句话不得超过100个字符。

示例:

Lorem ipsum dolor sit. Amet consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore
magna aliquyam erat, sed diam voluptua. At vero eos et accusam
et justo duo dolores.

致:(3 个元素,不破坏一个单词,而是一个句子)

" Lorem ipsum dolor sit. ",
" Amet consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt
  ut labore et dolore magna",
" aliquyam erat, sed diam voluptua. At vero eos et accusam
  et justo duo dolores. "

我怎样才能正确地做到这一点?

I want to split a text into sentences (split by . or BreakIterator).
But: Each sentence mustn't have more than 100 characters.

Example:

Lorem ipsum dolor sit. Amet consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore
magna aliquyam erat, sed diam voluptua. At vero eos et accusam
et justo duo dolores.

To: (3 elements, without breaking a word, but a sentence)

" Lorem ipsum dolor sit. ",
" Amet consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt
  ut labore et dolore magna",
" aliquyam erat, sed diam voluptua. At vero eos et accusam
  et justo duo dolores. "

How can I do this properly?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

我不咬妳我踢妳 2025-01-14 03:36:25

可能有更好的方法来做到这一点,但这里是:

public static void main(String... args) {

    String originalString = "Lorem ipsum dolor sit. Amet consetetur sadipscing elitr,sed diam nonumy eirmod tempor invidunt ut labore "
            + "et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores.";


    String[] s1 = originalString.split("\\.");
    List<String> list = new ArrayList<String>();

    for (String s : s1)
        if (s.length() > 100)
            list.addAll(Arrays.asList(s.split("(?<=\\G.{100})")));
        else
            list.add(s);

    System.out.println(list);
}

“分割字符串大小”正则表达式来自 这个问题。您可能可以集成这两个正则表达式,但我不确定这是否是一个明智的想法(:

如果正则表达式不在 Andrond 中运行(\G 运算符在任何地方都无法识别) ,尝试链接的其他解决方案来根据它的大小。

There's probably a better way to do it, but here it goes:

public static void main(String... args) {

    String originalString = "Lorem ipsum dolor sit. Amet consetetur sadipscing elitr,sed diam nonumy eirmod tempor invidunt ut labore "
            + "et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores.";


    String[] s1 = originalString.split("\\.");
    List<String> list = new ArrayList<String>();

    for (String s : s1)
        if (s.length() > 100)
            list.addAll(Arrays.asList(s.split("(?<=\\G.{100})")));
        else
            list.add(s);

    System.out.println(list);
}

The "split string in size" regex is from this SO question. You probably could integrate the two regex'es, but I'm not sure that would be a wise idea (:

If the regex doesn't run in Andrond (the \G operator is not recognized everywhere), try the other solutions linked to split a string based on its size.

神爱温柔 2025-01-14 03:36:25

正则表达式在这种情况下不会给你太大帮助。

我会使用空格或 . 分割文本,然后开始连接。像这样的东西:

伪代码

words = text.split("[\s\.]");
lines = new List();
while ( words.length() > 0 ) {

  String line = new String();
  while ( line.length() + words.get(0).length() < 100 ) {
    line += words.get(0);
    words.remove(words.get(0));
  }

  lines.add(line);

}

Regex will not help you a lot with this kind of situations.

I would split the text using spaces or . and afterwards start concatenating. Something like this:

Pseudo code

words = text.split("[\s\.]");
lines = new List();
while ( words.length() > 0 ) {

  String line = new String();
  while ( line.length() + words.get(0).length() < 100 ) {
    line += words.get(0);
    words.remove(words.get(0));
  }

  lines.add(line);

}
離殇 2025-01-14 03:36:25

解决了(感谢 Macarse 的启发):

String[] words = text.split("(?=[\\s\\.])");
ArrayList<String> array = new ArrayList<String>();
int i = 0;
while (words.length > i) {
    String line = "";
    while ( words.length > i && line.length() + words[i].length() < 100 ) {
        line += words[i];
        i++;
    }
    array.add(line);
}

Solved (thank you Macarse for the inspiration):

String[] words = text.split("(?=[\\s\\.])");
ArrayList<String> array = new ArrayList<String>();
int i = 0;
while (words.length > i) {
    String line = "";
    while ( words.length > i && line.length() + words[i].length() < 100 ) {
        line += words[i];
        i++;
    }
    array.add(line);
}
淡紫姑娘! 2025-01-14 03:36:25

按照前面的解决方案,我很快遇到了一个无限循环的问题,即每个单词可能超出限制的情况(非常不可能,但不幸的是我的环境非常受限)。所以,我为这个边缘情况添加了一个修复(有点)(我认为)。

import java.util.*;

public class Main
{
    public static void main(String[] args) {
        sentenceToLines("In which of the following, a person is constantly followed/chased by another person or group of several people?", 15);
    }

    private static ArrayList<String> sentenceToLines(String s, int limit) {
        String[] words = s.split("(?=[\\s\\.])");
        ArrayList<String> wordList =  new ArrayList<String>(Arrays.asList(words));
        ArrayList<String> array = new ArrayList<String>();
        int i = 0, temp;
        String word, line;
        while (i < wordList.size()) {
            line = "";
            temp = i;
            // split the long words to the size of the limit
            while(wordList.get(i).length() > limit) {
                word = wordList.get(i);
                wordList.add(i++, word.substring(0, limit));
                wordList.add(i, word.substring(limit));
                wordList.remove(i+1);
            }
            i = temp;
            // continue making lines with newly split words
            while ( i < wordList.size() && line.length() + wordList.get(i).length() <= limit ) {
                line += wordList.get(i);
                i++;
            }
            System.out.println(line.trim());
            array.add(line.trim());
        }
        return array;
    }
    
}

Following the previous solutions, I quickly got into a problem with an infinite loop for the case when each word may exceed the limit (very unlikely, but unfortunately I have a very constrained environment). So, I added a fix (kinda) for this edge case (I think).

import java.util.*;

public class Main
{
    public static void main(String[] args) {
        sentenceToLines("In which of the following, a person is constantly followed/chased by another person or group of several people?", 15);
    }

    private static ArrayList<String> sentenceToLines(String s, int limit) {
        String[] words = s.split("(?=[\\s\\.])");
        ArrayList<String> wordList =  new ArrayList<String>(Arrays.asList(words));
        ArrayList<String> array = new ArrayList<String>();
        int i = 0, temp;
        String word, line;
        while (i < wordList.size()) {
            line = "";
            temp = i;
            // split the long words to the size of the limit
            while(wordList.get(i).length() > limit) {
                word = wordList.get(i);
                wordList.add(i++, word.substring(0, limit));
                wordList.add(i, word.substring(limit));
                wordList.remove(i+1);
            }
            i = temp;
            // continue making lines with newly split words
            while ( i < wordList.size() && line.length() + wordList.get(i).length() <= limit ) {
                line += wordList.get(i);
                i++;
            }
            System.out.println(line.trim());
            array.add(line.trim());
        }
        return array;
    }
    
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文