从Python中的嵌套列表中提取字符串

发布于 2025-01-07 17:12:00 字数 772 浏览 1 评论 0原文

可能的重复:
在Python中压平(不规则)列表列表

I'我尝试使用 python 中的 nltk 库,更具体地说是 wordnet 语料库,来提取广泛语义类别(如“动物”)中的所有单词。我已经设法编写了一个函数,可以遍历所有类别并提取其中的单词,但最终得到的却是列表中的一大堆混乱的列表。这些列表没有任何可预测的长度或深度,它们看起来像这样:

['pet', 'pest', 'mate', 'young', 'stunt', 'giant', ['hen', 'dam', 'filly'], ['head', 'stray', 'dog', ['puppy', 'toy', 'spitz', 'pooch', 'doggy', 'cur', 'mutt', 'pug', 'corgi', ['Peke'], ['chow'], ['feist', 'fice'], ['hound', ['Lhasa', 'cairn']], ['boxer', 'husky']], ['tabby', 'tabby', 'queen', 'Manx', 'tom', 'kitty', 'puss', 'pussy', ['gib']]]

我想要的是能够从中获取每个字符串,并返回一个未嵌套的列表。有什么建议吗?

Possible Duplicate:
Flatten (an irregular) list of lists in Python

I'm trying to use the nltk library in python, and more specifically the wordnet corpus, to extract all the words in a broad semantic category like 'animal'. I've managed to write a function that goes down through all the categories and extracts the words in them, but what I end up with is a huge jumble of lists within lists. The lists aren't of any predictable length or depth, they look like this:

['pet', 'pest', 'mate', 'young', 'stunt', 'giant', ['hen', 'dam', 'filly'], ['head', 'stray', 'dog', ['puppy', 'toy', 'spitz', 'pooch', 'doggy', 'cur', 'mutt', 'pug', 'corgi', ['Peke'], ['chow'], ['feist', 'fice'], ['hound', ['Lhasa', 'cairn']], ['boxer', 'husky']], ['tabby', 'tabby', 'queen', 'Manx', 'tom', 'kitty', 'puss', 'pussy', ['gib']]]

What I want is to be able to grab each of those strings out of that , and return a single, unnested list. Any advice?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

心凉怎暖 2025-01-14 17:12:00

一般来说,当您必须处理任意级别的嵌套时,递归解决方案是一个不错的选择。列表中的列表、解析 HTML(标签中的标签)、使用文件系统(目录中的目录)等。

我还没有广泛测试此代码,但我相信它应该做你想要的事情:

ll = [ 1, 2, 3, [4, 5, [6, 7, 8]]]

def flatten(input_list):
    output_list = []
    for element in input_list:
        if type(element) == list:
            output_list.extend(flatten(element))
        else:
            output_list.append(element)
    return output_list

print (flatten(ll)) #prints [1, 2, 3, 4, 5, 6, 7, 8]

一般来说,递归很容易想到关于并且解决方案往往非常优雅(如上),但对于真正非常深层嵌套的事物 - 思考数千层深度 - 您可能会遇到堆栈溢出等问题。

一般来说,这不是问题,但我相信递归函数总是可以*转换为循环(它只是看起来不太好。)

  • 注意:我在这里对我的 compsci 理论并不热衷。如果我错了,有人可以添加详细信息或纠正我。

In general, when you have to deal with arbitrary levels of nesting, a recursive solution is a good fit. Lists within lists, parsing HTML (tags within tags), working with filesystems (directories within directories), etc.

I haven't tested this code extensively, but I believe it should do what you want:

ll = [ 1, 2, 3, [4, 5, [6, 7, 8]]]

def flatten(input_list):
    output_list = []
    for element in input_list:
        if type(element) == list:
            output_list.extend(flatten(element))
        else:
            output_list.append(element)
    return output_list

print (flatten(ll)) #prints [1, 2, 3, 4, 5, 6, 7, 8]

In general recursion is very easy to think about and the solutions tend to be very elegant (like above) but for really, really deeply nested things - think thousands of levels deep - you can run into problems like stack overflow.

Generally this isn't a problem, but I believe a recursive function can always* be converted to a loop (it just doesn't look as nice.)

  • Note: I am not crash-hot on my compsci theory here. Someone can add details or correct me if I'm wrong.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文