如何在Python中将一个很长的字符串拆分为较短的字符串列表

发布于 2024-11-10 15:20:14 字数 733 浏览 3 评论 0原文

在我当前的 django 项目中，我有一个模型，它存储非常长的字符串（每个数据库条目可以是 5000-10000 甚至更多字符），然后当用户调用记录时我需要将它们拆分（它确实需要在一个记录在数据库中）。我需要的是它返回一个较短字符串的列表（查询集？取决于是否在“SQL”部分或按原样获取所有列表并在视图中进行解析）（我返回的列表中每个字符串 100 - 500 个字符）到模板）。

我在任何地方都找不到 python split 命令，也找不到示例或任何类型的答案....

我总是可以计算单词并附加但计算单词....但我确信必须有某种函数诸如此类的事情......

编辑：谢谢大家，但我想我没有被理解，

示例：
字符串：“这是一个非常长的字符串，有很多很多很多很多句子，没有一个字符可以用来分割，只能按单词数来分割”

该字符串是 django 模型的 textField。

我需要分割它，让我们说每5个单词，这样我会得到：

['这是一个非常长的字符串'，'有很多很多很多'，'还有更多的句子和'，'没有一个字符'，'我可以用来'，'分割，仅按数字'，'单词']

问题是，几乎每种编程语言都有按单词数分割”的实用函数，但我在 python 中找不到。

谢谢，埃雷兹

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

澉约 2024-11-17 15:20:14

>>> s = "This is a very long string with many many many many and many more sentences and there is not one character that i can use to split by, just by number of words"
>>> l = s.split()
>>> n = 5
>>> [' '.join(l[x:x+n]) for x in xrange(0, len(l), n)]
['This is a very long',
 'string with many many many',
 'many and many more sentences',
 'and there is not one',
 'character that i can use',
 'to split by, just by',
 'number of words']

>>> s = "This is a very long string with many many many many and many more sentences and there is not one character that i can use to split by, just by number of words"
>>> l = s.split()
>>> n = 5
>>> [' '.join(l[x:x+n]) for x in xrange(0, len(l), n)]
['This is a very long',
 'string with many many many',
 'many and many more sentences',
 'and there is not one',
 'character that i can use',
 'to split by, just by',
 'number of words']

回复收藏 0 原文

很快妥协 2024-11-17 15:20:14

这是一个想法：

def split_chunks(s, chunksize):
    pos = 0
    while(pos != -1):
        new_pos = s.rfind(" ", pos, pos+chunksize)
        if(new_pos == pos):
            new_pos += chunksize # force split in word
        yield s[pos:new_pos]
        pos = new_pos

它尝试将字符串分割成长度最多为 chunksize 的块。它尝试在空格处拆分，但如果不能，它会在单词中间拆分：

>>> foo = "asdf qwerty sderf sdefw regf"
>>> list(split_chunks(foo, 6)
['asdf', ' qwert', 'y', ' sderf', ' sdefw', ' regf', '']

我想它需要一些调整（例如如何处理单词内部发生的拆分），但它应该为您提供一个起点。

要按字数拆分，请执行以下操作：

def split_n_chunks(s, words_per_chunk):
    s_list = s.split()
    pos = 0
    while pos < len(s_list):
        yield s_list[pos:pos+words_per_chunk]
        pos += words_per_chunk

Here is an idea:

def split_chunks(s, chunksize):
    pos = 0
    while(pos != -1):
        new_pos = s.rfind(" ", pos, pos+chunksize)
        if(new_pos == pos):
            new_pos += chunksize # force split in word
        yield s[pos:new_pos]
        pos = new_pos

This tries to split strings into chunks at most chunksize in length. It tries to split at spaces, but if it can't it splits in the middle of a word:

>>> foo = "asdf qwerty sderf sdefw regf"
>>> list(split_chunks(foo, 6)
['asdf', ' qwert', 'y', ' sderf', ' sdefw', ' regf', '']

I guess it requires some tweaking though (for instance how to handle splits that occur inside words), but it should give you a starting point.

To split by number of words, do this:

def split_n_chunks(s, words_per_chunk):
    s_list = s.split()
    pos = 0
    while pos < len(s_list):
        yield s_list[pos:pos+words_per_chunk]
        pos += words_per_chunk

回复收藏 0 原文

~没有更多了~