正则表达式匹配单词和尾随空格对

发布于 2024-12-14 08:24:22 字数 504 浏览 3 评论 0原文

我有一个文本：

"    Alice, Bob    Charlie  "

并且我想获得单词对（如果有的话）和它后面的空格。也就是说：

[("", "    "), ("Alice,", " "), ("Bob", "    "), ("Charlie", "  ")]`

在Python中，我尝试过：

re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ")

这几乎可以工作 - 它只是在末尾添加一个空对 ("", "") 。如何摆脱它？除了 .pop() 之外？另外，我真的不明白为什么它会在那里 - 在它与查理的空白匹配之后它应该完成，不是吗？

编辑：澄清一下 - 我想要第一对，即没有带有空格的单词。最后一个——没有单词，没有空格——是我想去掉的。没有 .pop()，可能......

原文

I have a text:

"    Alice, Bob    Charlie  "

and I would like to get pairs of word (if any) and the whitespace after it. That is:

[("", "    "), ("Alice,", " "), ("Bob", "    "), ("Charlie", "  ")]`

In Python, I tried:

re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ")

which almost works - it just adds an empty pair ("", "") at the end. How to get rid of it? Except for .pop()? Also, I don't really understand why it is there at all - after it matches Charlie's whitespace it should finish, no?

Edit: to clarify - I want the first pair, i.e. no word with some whitespace. The last one - no word, no whitespace - is the one I want to get rid of. Without .pop(), possibly...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ぃ弥猫深巷。 2024-12-21 08:24:22

我认为这会做到这一点

re.findall('(\S+|^)(\s*)', s)

I think this would do that

re.findall('(\S+|^)(\s*)', s)

回复收藏 0 原文

我喜欢麦丽素 2024-12-21 08:24:22

尝试将 \s* 更改为 \s+ 以要求至少 1 个空格字符：

>>> re.findall(r"(\S*)(\s+)", "    Alice, Bob    Charlie  ")
[('', '    '), ('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]

Try changing \s* to \s+ to require at least 1 character of whitespace:

>>> re.findall(r"(\S*)(\s+)", "    Alice, Bob    Charlie  ")
[('', '    '), ('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]

回复收藏 0 原文

与往事干杯 2024-12-21 08:24:22

re.findall(r"(\S+)(\s*)", "    Alice, Bob    Charlie  ")

在 \S 返回您可能想要的内容后使用 + 符号：

[('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]

否则 \S*\s* 可能会匹配末尾的空字符串：零或多个和零或多个也可以等于零长度。

其他可能性（除了 .pop()）是：

[a for a in re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ") if a != ('','')]

或：

re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ")[:-1]

两者都返回您所需要的内容（包括开头的空格）：

[('', '    '), ('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]

re.findall(r"(\S+)(\s*)", "    Alice, Bob    Charlie  ")

with a + sign after the \S returns what you probably want:

[('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]

otherwise \S*\s* can possibly match empty string at the end: zero-or-more and zero-or-more can equal to zero-length too.

Other possibility (apart from .pop()) would be:

[a for a in re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ") if a != ('','')]

or:

re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ")[:-1]

both of which return exactly what you need (included the whitespace at the beginning):

[('', '    '), ('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]

回复收藏 0 原文

~没有更多了~

关于作者

与他有关

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

正则表达式匹配单词和尾随空格对

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

正则表达式匹配单词和尾随空格对

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。