将字符串分成最大尺寸的块

发布于 2025-02-13 14:39:17 字数 770 浏览 1 评论 0原文

我目前正在为Discord Bot的功能创作，该功能允许用户抓住当前播放歌曲的歌词。可悲的是，每个嵌入的Discord的最大限制为1024个字符，因此有大量歌词的歌曲被切断/丢失。

为了避免这种情况，我试图使用每页200个单词将歌词分为单独的页面。（显然，这仍然有错误的余地，并且对于此用例并没有真正优化）

def create_embed(lyrics, song):

    words = re.findall(r"\S+|\n", lyrics)
    num_pages = (len(words) // 200) + 1
    n = 200
    pages = [" ".join(words[i:i + n]) for i in range(0, len(words), n)]

，因为我将其用于歌词，所以文本被以非常尴尬的位置分开句子，很难阅读。

我要做的是将我的 n = 200 设置为我搜索下一个线路的最大范围。假设我有这个文字：

阴影落在我的心中\ n 我把月亮掉了\ n

我有 n = 10 离开我

阴影落在我的心上\ n我黑色

但是我希望它在此字符串的最后一个线路上停止，含义：

阴影落在我的心上\ n

实施这样的最简单方法是什么？我需要使用带有负步骤的for_loop搜索吗？看来这将是一种强迫的方法。

原文

I am currently working on a feature for a discord bot which allows the user to grab the lyrics of the currently playing song. Sadly discord has a maximum limit of 1024 characters for every embed so songs with a big amount of lyrics get cut off/throw an error.

To avoid this I tried to split the lyrics into seperate pages using 200 words per page. (Obviously this still has room for error with long words and just isn't really optimized for this use case)

def create_embed(lyrics, song):

    words = re.findall(r"\S+|\n", lyrics)
    num_pages = (len(words) // 200) + 1
    n = 200
    pages = [" ".join(words[i:i + n]) for i in range(0, len(words), n)]

The problem with this is, since I use this for lyrics, the text gets split in really awkward positions like in the middle of the sentence, making it hard to read.

What I want to do is set my n = 200 as a maximum range in which I search for the next linebreak. Let's say I have this text:

Shadows fall over my heart \n
I black out the moon \n

And I have n = 10
leaving me with

Shadows fall over my heart \n I black out the

but instead I want it to stop at the last linebreak in this string meaning:

Shadows fall over my heart \n

What is the simplest way to implement something like this? Would I need to search using a for_loop with negative steps? It would seem that this would be a rather forced approach.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浪推晚风 2025-02-20 14:39:17

因此，我不久前重新审视了这个问题，这是我终于想到的，尽管可能会有更快/更简单的方法，但是如果其他人遇到这个问题，我仍然想分享。

start_idx = 0
length = 1023
end_idx = 0

while end_idx < len(lyrics):
    print(f"end_idx:{end_idx} | len{len(lyrics)}")
    end_idx = lyrics.rfind("\n", start_idx, length + start_idx) + 1
    print(lyrics[start_idx:end_idx])
    start_idx = end_idx

基本上，它在长度1023的块中循环遍历文本，并使用python .rfind（）（）找到了“ \ n”的最后一个事件，然后该算法将其用作start_idx。

只要确保您将“ \ n”附加到末尾，否则循环就永远不会结束，因为只要end_idx小于字符串的长度，它总是会搜索下一个线路破裂。

So I revisited that problem a while ago and this is what I finally came up with, although there is probably a much faster/easier way, but I still wanted to share if anyone else had that problem.

start_idx = 0
length = 1023
end_idx = 0

while end_idx < len(lyrics):
    print(f"end_idx:{end_idx} | len{len(lyrics)}")
    end_idx = lyrics.rfind("\n", start_idx, length + start_idx) + 1
    print(lyrics[start_idx:end_idx])
    start_idx = end_idx

Basically it loops through the text in chunks of the length 1023 and finds the last occurence of "\n" using python .rfind(), which the algorithm then uses as the start_idx.

Just make sure you have "\n" appended to the end, else the loop will never end as it always searches for the next linebreak as long as the end_idx is smaller than the length of the string.

回复收藏 0 原文

~没有更多了~