使用正则表达式从行列表中返回单词列表

发布于 2024-08-28 05:38:58 字数 500 浏览 5 评论 0原文

我在字符串列表上运行以下代码以返回其单词列表:

words = [re.split('\\s+', line) for line in lines]

但是,我最终得到类似的内容:

[['import', 're', ''], ['', ''], ['def', 'word_count(filename):', ''], ...]

与所需的相反:

['import', 're', '', '', '', 'def', 'word_count(filename):', '', ...]

如何解压列表 re.split('\ \s+', line) 在上面的列表理解中产生?天真地,我尝试使用 * 但这不起作用。

(我正在寻找一种简单且Python式的方法;我很想编写一个函数,但我确信该语言可以解决这个问题。)

I'm running the following code on a list of strings to return a list of its words:

words = [re.split('\\s+', line) for line in lines]

However, I end up getting something like:

[['import', 're', ''], ['', ''], ['def', 'word_count(filename):', ''], ...]

As opposed to the desired:

['import', 're', '', '', '', 'def', 'word_count(filename):', '', ...]

How can I unpack the lists re.split('\\s+', line) produces in the above list comprehension? Naïvely, I tried using * but that doesn't work.

(I'm looking for a simple and Pythonic way of doing; I was tempted to write a function but I'm sure the language accommodates for this issue.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

或十年 2024-09-04 05:38:58
>>> import re
>>> from itertools import chain
>>> lines = ["hello world", "second line", "third line"]
>>> words = chain(*[re.split(r'\s+', line) for line in lines])

这将为您提供一个可用于循环遍历所有单词的迭代器:

>>> for word in words:
...    print(word)
... 
hello
world
second
line
third
line

创建列表而不是迭代器只需将迭代器包装在 list 调用中即可:

>>> words = list(chain(*[re.split(r'\s+', line) for line in lines]))
>>> import re
>>> from itertools import chain
>>> lines = ["hello world", "second line", "third line"]
>>> words = chain(*[re.split(r'\s+', line) for line in lines])

This will give you an iterator that can be used for looping through all words:

>>> for word in words:
...    print(word)
... 
hello
world
second
line
third
line

Creating a list instead of an iterator is just a matter of wrapping the iterator in a list call:

>>> words = list(chain(*[re.split(r'\s+', line) for line in lines]))
拒绝两难 2024-09-04 05:38:58

您获得列表列表的原因是因为 re.split() 返回一个列表,然后将其“附加”到列表理解输出。

目前尚不清楚为什么要使用它(或者可能只是一个不好的例子),但如果您可以将完整内容(所有行)作为字符串获取,则可以使用

words = re.split(r'\s+', lines)

iflines 的乘积:

open('filename').readlines()

使用

open('filename').read()

代替。

The reason why you get a list of lists is because re.split() returns a list which then in 'appended' to the list comprehension output.

It's unclear why you are using that (or probably just a bad example) but if you can get the full content (all lines) as a string you can just do

words = re.split(r'\s+', lines)

if lines is the product of:

open('filename').readlines()

use

open('filename').read()

instead.

π浅易 2024-09-04 05:38:58

你总是可以这样做:

words = []
for line in lines:
  words.extend(re.split('\\s+',line))

它不像单行列表理解那么优雅,但它可以完成工作。

You can always do this:

words = []
for line in lines:
  words.extend(re.split('\\s+',line))

It's not nearly as elegant as a one-liner list comprehension, but it gets the job done.

过去的过去 2024-09-04 05:38:58

刚刚偶然发现了这个老问题,我想我有更好的解决方案。通常,如果你想嵌套一个列表理解(“附加”每个列表),你会向后思考(非for循环)。这不是你想要的:

>>> import re
>>> lines = ["hello world", "second line", "third line"]
>>> [[word for word in re.split(r'\s+', line)] for line in lines]
[['hello', 'world'], ['second', 'line'], ['third', 'line']]

但是,如果你想“扩展”而不是“追加”你正在生成的列表,只需省略额外的方括号集并反转你的 for 循环(将它们放回“右”) “ 命令)。

>>> [word for line in lines for word in re.split(r'\s+', line)]
['hello', 'world', 'second', 'line', 'third', 'line']

对我来说,这似乎是一个更 Pythonic 的解决方案,因为它基于列表处理逻辑,而不是一些 random-ass 内置函数。每个程序员都应该知道如何做到这一点(尤其是那些试图学习 Lisp 的程序员!)

Just stumbled across this old question, and I think I have a better solution. Normally if you want to nest a list comprehension ("append" each list), you think backwards (un-for-loop-like). This is not what you want:

>>> import re
>>> lines = ["hello world", "second line", "third line"]
>>> [[word for word in re.split(r'\s+', line)] for line in lines]
[['hello', 'world'], ['second', 'line'], ['third', 'line']]

However if you want to "extend" instead of "append" the lists you're generating, just leave out the extra set of square brackets and reverse your for-loops (putting them back in the "right" order).

>>> [word for line in lines for word in re.split(r'\s+', line)]
['hello', 'world', 'second', 'line', 'third', 'line']

This seems like a more Pythonic solution to me since it is based in list-processing logic rather than some random-ass built-in function. Every programmer should know how to do this (especially ones trying to learn Lisp!)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文