正则表达式匹配单词和尾随空格对

发布于 2024-12-14 08:24:22 字数 504 浏览 3 评论 0原文

我有一个文本:

"    Alice, Bob    Charlie  "

并且我想获得单词对(如果有的话)和它后面的空格。也就是说:

[("", "    "), ("Alice,", " "), ("Bob", "    "), ("Charlie", "  ")]`

在Python中,我尝试过:

re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ")

这几乎可以工作 - 它只是在末尾添加一个空对 ("", "") 。如何摆脱它?除了 .pop() 之外?另外,我真的不明白为什么它会在那里 - 在它与查理的空白匹配之后它应该完成,不是吗?

编辑:澄清一下 - 我想要第一对,即没有带有空格的单词。最后一个——没有单词,没有空格——是我想去掉的。没有 .pop(),可能......

I have a text:

"    Alice, Bob    Charlie  "

and I would like to get pairs of word (if any) and the whitespace after it. That is:

[("", "    "), ("Alice,", " "), ("Bob", "    "), ("Charlie", "  ")]`

In Python, I tried:

re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ")

which almost works - it just adds an empty pair ("", "") at the end. How to get rid of it? Except for .pop()? Also, I don't really understand why it is there at all - after it matches Charlie's whitespace it should finish, no?

Edit: to clarify - I want the first pair, i.e. no word with some whitespace. The last one - no word, no whitespace - is the one I want to get rid of. Without .pop(), possibly...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

ぃ弥猫深巷。 2024-12-21 08:24:22

我认为这会做到这一点

re.findall('(\S+|^)(\s*)', s)

I think this would do that

re.findall('(\S+|^)(\s*)', s)
我喜欢麦丽素 2024-12-21 08:24:22

尝试将 \s* 更改为 \s+ 以要求至少 1 个空格字符:

>>> re.findall(r"(\S*)(\s+)", "    Alice, Bob    Charlie  ")
[('', '    '), ('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]

Try changing \s* to \s+ to require at least 1 character of whitespace:

>>> re.findall(r"(\S*)(\s+)", "    Alice, Bob    Charlie  ")
[('', '    '), ('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]
与往事干杯 2024-12-21 08:24:22
re.findall(r"(\S+)(\s*)", "    Alice, Bob    Charlie  ")

\S 返回您可能想要的内容后使用 + 符号:

[('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]

否则 \S*\s* 可能会匹配末尾的空字符串:零或多个和零或多个也可以等于零长度。

其他可能性(除了 .pop())是:

[a for a in re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ") if a != ('','')]

或:

re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ")[:-1]

两者都返回您所需要的内容(包括开头的空格):

[('', '    '), ('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]
re.findall(r"(\S+)(\s*)", "    Alice, Bob    Charlie  ")

with a + sign after the \S returns what you probably want:

[('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]

otherwise \S*\s* can possibly match empty string at the end: zero-or-more and zero-or-more can equal to zero-length too.

Other possibility (apart from .pop()) would be:

[a for a in re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ") if a != ('','')]

or:

re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ")[:-1]

both of which return exactly what you need (included the whitespace at the beginning):

[('', '    '), ('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文