在python中截断一条长字符串 - 但仅在特定字符之后

发布于 2025-01-26 18:33:46 字数 1439 浏览 4 评论 0原文

我使用textwrap将长字符串分为块,每个字符限制为280个字符。我不希望分裂随机发生。它只能在特定字符之后发生。在我的情况下,符号和单个线路断开\ n

这是我的代码:

query = 'Lorem ipsum dolor\n\n Lorem ipsum 0.5€\n Lorem ipsum 0.2€\n (...)'

for item in [query]:
    # obtain length of string
    item_length = len(item)

    # check length
    if item_length <= 280:
        # do something here

    elif item_length >= 280:
        item_length_limit = item_length / 280

        # determine the number of items
        item_chunk_length = item_length / math.ceil(item_length_limit)

        # chunk the item into individual pieces
        item_chunks = textwrap.wrap(item,  math.ceil(
            item_chunk_length), break_long_words=False, replace_whitespace=False)

        # iterate over the chunks
        for x, chunk in zip(range(len(item_chunks)), item_chunks):
            if x == 0:
                print(f'{chunk} 1/{len(item_chunks)}')
            else:
                print(f'{chunk} {x+1}/{len(item_chunks)}')

当前输出(以方便起见为60个字符分开):

Lorem ipsum dolor\n\n Lorem ipsum 0.5€\n Lorem ipsum 1/3
dolor 0.2€\n Lorem ipsum 0.4€\n Lorem ipsum 0.4€\n Lorem 2/3
Ipsum 0.4€ 3/3

所需的输出:

Lorem ipsum dolor\n\n Lorem ipsum 0.5€\n 1/4
Lorem ipsum dolor 0.2€\n 2/4
Lorem ipsum 0.4€\n Lorem ipsum 0.4€\n 3/4
Lorem Ipsum 0.4€ 4/4

I use textwrap to split a long string into chunks, limited to 280 characters each. I don't want the split to occur at random though; it should only occur after a specific character. In my case after the sign and a single line break \n.

This is my code:

query = 'Lorem ipsum dolor\n\n Lorem ipsum 0.5€\n Lorem ipsum 0.2€\n (...)'

for item in [query]:
    # obtain length of string
    item_length = len(item)

    # check length
    if item_length <= 280:
        # do something here

    elif item_length >= 280:
        item_length_limit = item_length / 280

        # determine the number of items
        item_chunk_length = item_length / math.ceil(item_length_limit)

        # chunk the item into individual pieces
        item_chunks = textwrap.wrap(item,  math.ceil(
            item_chunk_length), break_long_words=False, replace_whitespace=False)

        # iterate over the chunks
        for x, chunk in zip(range(len(item_chunks)), item_chunks):
            if x == 0:
                print(f'{chunk} 1/{len(item_chunks)}')
            else:
                print(f'{chunk} {x+1}/{len(item_chunks)}')

Current output (split at 60 characters for convenience):

Lorem ipsum dolor\n\n Lorem ipsum 0.5€\n Lorem ipsum 1/3
dolor 0.2€\n Lorem ipsum 0.4€\n Lorem ipsum 0.4€\n Lorem 2/3
Ipsum 0.4€ 3/3

Desired output:

Lorem ipsum dolor\n\n Lorem ipsum 0.5€\n 1/4
Lorem ipsum dolor 0.2€\n 2/4
Lorem ipsum 0.4€\n Lorem ipsum 0.4€\n 3/4
Lorem Ipsum 0.4€ 4/4

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

动次打次papapa 2025-02-02 18:33:46

这不会是最好的算法,而是完成工作。

import re
query = "<String> I used & as a seperator"
query = f"{'a'*100}&{'b'*150}&{'c'*210}&{'d'*200}&{'e'*70}&"

chunks = re.split('&',query)

def joiner(chunks):
    i = 0
    s = ""
    newchunks = []
    while (i<len(chunks)):
        try:
            if len(chunks[i]) + len(chunks[i+1]) < 280:
                newchunks.append(chunks[i]+chunks[i+1])
                i += 1
            else:
                newchunks.append(chunks[i])
            i+= 1
        except IndexError:
            newchunks.append(chunks[i])
            i += 1
    if chunks == newchunks:##if at maximum chunking
        return chunks
    else:
        return joiner(newchunks)

要打印出值,只需打印此功能的返回值

This won't be the best algorithm out there, but gets the job done.

import re
query = "<String> I used & as a seperator"
query = f"{'a'*100}&{'b'*150}&{'c'*210}&{'d'*200}&{'e'*70}&"

chunks = re.split('&',query)

def joiner(chunks):
    i = 0
    s = ""
    newchunks = []
    while (i<len(chunks)):
        try:
            if len(chunks[i]) + len(chunks[i+1]) < 280:
                newchunks.append(chunks[i]+chunks[i+1])
                i += 1
            else:
                newchunks.append(chunks[i])
            i+= 1
        except IndexError:
            newchunks.append(chunks[i])
            i += 1
    if chunks == newchunks:##if at maximum chunking
        return chunks
    else:
        return joiner(newchunks)

to print the values out, just print the return value of this function

心如狂蝶 2025-02-02 18:33:46

我不是100%确定我了解您的问题,但是您正在寻找类似的问题吗?

query.split('€\n')

它将创建一个列表,其中每个条目是每次遇到'€\ n'字符时都会介于两者之间的片段。

I'm not 100% sure I understood your question, but are you looking for something like that ?

query.split('€\n')

It will create a list where every entry is a snippet of your string in-between everytime you encounter the '€\n' characters.

心凉怎暖 2025-02-02 18:33:46

这将有效地

query = r'Lorem ipsum dolor\n\n Lorem ipsum 0.5€\n Lorem ipsum 0.2€\n (...)'
#raw required
split_string = query.split(r"0.5€\n")
for i in split_string:
  print(i)

将字符串分为数组,然后打印出结果

希望:)

This is will work

query = r'Lorem ipsum dolor\n\n Lorem ipsum 0.5€\n Lorem ipsum 0.2€\n (...)'
#raw required
split_string = query.split(r"0.5€\n")
for i in split_string:
  print(i)

This splits the string into an array and then prints the result

Hope this helps :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文