使用正则表达式从行列表中返回单词列表

发布于 2024-08-28 05:38:58 字数 500 浏览 5 评论 0原文

我在字符串列表上运行以下代码以返回其单词列表：

words = [re.split('\\s+', line) for line in lines]

但是，我最终得到类似的内容：

[['import', 're', ''], ['', ''], ['def', 'word_count(filename):', ''], ...]

与所需的相反：

['import', 're', '', '', '', 'def', 'word_count(filename):', '', ...]

如何解压列表 re.split('\ \s+', line) 在上面的列表理解中产生？天真地，我尝试使用 * 但这不起作用。

（我正在寻找一种简单且Python式的方法；我很想编写一个函数，但我确信该语言可以解决这个问题。）

原文

I'm running the following code on a list of strings to return a list of its words:

words = [re.split('\\s+', line) for line in lines]

However, I end up getting something like:

[['import', 're', ''], ['', ''], ['def', 'word_count(filename):', ''], ...]

As opposed to the desired:

['import', 're', '', '', '', 'def', 'word_count(filename):', '', ...]

How can I unpack the lists re.split('\\s+', line) produces in the above list comprehension? Naïvely, I tried using * but that doesn't work.

(I'm looking for a simple and Pythonic way of doing; I was tempted to write a function but I'm sure the language accommodates for this issue.)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

或十年 2024-09-04 05:38:58

>>> import re
>>> from itertools import chain
>>> lines = ["hello world", "second line", "third line"]
>>> words = chain(*[re.split(r'\s+', line) for line in lines])

这将为您提供一个可用于循环遍历所有单词的迭代器：

>>> for word in words:
...    print(word)
... 
hello
world
second
line
third
line

创建列表而不是迭代器只需将迭代器包装在 list 调用中即可：

>>> words = list(chain(*[re.split(r'\s+', line) for line in lines]))

>>> import re
>>> from itertools import chain
>>> lines = ["hello world", "second line", "third line"]
>>> words = chain(*[re.split(r'\s+', line) for line in lines])

This will give you an iterator that can be used for looping through all words:

>>> for word in words:
...    print(word)
... 
hello
world
second
line
third
line

Creating a list instead of an iterator is just a matter of wrapping the iterator in a list call:

>>> words = list(chain(*[re.split(r'\s+', line) for line in lines]))

回复收藏 0 原文

拒绝两难 2024-09-04 05:38:58

您获得列表列表的原因是因为 re.split() 返回一个列表，然后将其“附加”到列表理解输出。

目前尚不清楚为什么要使用它（或者可能只是一个不好的例子），但如果您可以将完整内容（所有行）作为字符串获取，则可以使用

words = re.split(r'\s+', lines)

iflines 的乘积：

open('filename').readlines()

使用

open('filename').read()

代替。

The reason why you get a list of lists is because re.split() returns a list which then in 'appended' to the list comprehension output.

It's unclear why you are using that (or probably just a bad example) but if you can get the full content (all lines) as a string you can just do

words = re.split(r'\s+', lines)

if lines is the product of:

open('filename').readlines()

use

open('filename').read()

instead.

回复收藏 0 原文

π浅易 2024-09-04 05:38:58

你总是可以这样做：

words = []
for line in lines:
  words.extend(re.split('\\s+',line))

它不像单行列表理解那么优雅，但它可以完成工作。

You can always do this:

words = []
for line in lines:
  words.extend(re.split('\\s+',line))

It's not nearly as elegant as a one-liner list comprehension, but it gets the job done.

回复收藏 0 原文

过去的过去 2024-09-04 05:38:58

刚刚偶然发现了这个老问题，我想我有更好的解决方案。通常，如果你想嵌套一个列表理解（“附加”每个列表），你会向后思考（非for循环）。这不是你想要的：

>>> import re
>>> lines = ["hello world", "second line", "third line"]
>>> [[word for word in re.split(r'\s+', line)] for line in lines]
[['hello', 'world'], ['second', 'line'], ['third', 'line']]

但是，如果你想“扩展”而不是“追加”你正在生成的列表，只需省略额外的方括号集并反转你的 for 循环（将它们放回“右”） “ 命令）。

>>> [word for line in lines for word in re.split(r'\s+', line)]
['hello', 'world', 'second', 'line', 'third', 'line']

对我来说，这似乎是一个更 Pythonic 的解决方案，因为它基于列表处理逻辑，而不是一些 random-ass 内置函数。每个程序员都应该知道如何做到这一点（尤其是那些试图学习 Lisp 的程序员！）

Just stumbled across this old question, and I think I have a better solution. Normally if you want to nest a list comprehension ("append" each list), you think backwards (un-for-loop-like). This is not what you want:

>>> import re
>>> lines = ["hello world", "second line", "third line"]
>>> [[word for word in re.split(r'\s+', line)] for line in lines]
[['hello', 'world'], ['second', 'line'], ['third', 'line']]

However if you want to "extend" instead of "append" the lists you're generating, just leave out the extra set of square brackets and reverse your for-loops (putting them back in the "right" order).

>>> [word for line in lines for word in re.split(r'\s+', line)]
['hello', 'world', 'second', 'line', 'third', 'line']

This seems like a more Pythonic solution to me since it is based in list-processing logic rather than some random-ass built-in function. Every programmer should know how to do this (especially ones trying to learn Lisp!)

回复收藏 0 原文

~没有更多了~