在Python中通过正则表达式对字符串进行分区

发布于 2024-11-06 00:46:19 字数 446 浏览 0 评论 0原文

我需要在字边界（空格）上将字符串拆分为数组，同时保留空格。

例如：

'this is  a\nsentence'

将成为

['this', ' ', 'is', '  ', 'a' '\n', 'sentence']

我了解 str.partition 和 re.split，但是它们都没有完全达到我想要的效果，并且没有 re.partition。

我应该如何在Python中以合理的效率对空格进行字符串分区？

原文

I need to split a string into an array on word boundaries (whitespace) while maintaining the whitespace.

For example:

'this is  a\nsentence'

Would become

['this', ' ', 'is', '  ', 'a' '\n', 'sentence']

I know about str.partition and re.split, but neither of them quite do what I want and there is no re.partition.

How should I partition strings on whitespace in Python with reasonable efficiency?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

皇甫轩 2024-11-13 00:46:19

试试这个：

s = "this is  a\nsentence"
re.split(r'(\W+)', s) # Notice parentheses and a plus sign.

结果将是：

['this', ' ', 'is', '  ', 'a', '\n', 'sentence']

Try this:

s = "this is  a\nsentence"
re.split(r'(\W+)', s) # Notice parentheses and a plus sign.

Result would be:

['this', ' ', 'is', '  ', 'a', '\n', 'sentence']

回复收藏 0 原文

听风念你 2024-11-13 00:46:19

re 中的空白符号是 '\s' 而不是 '\W'

比较：

import re


s = "With a sign # written @ the beginning , that's  a\nsentence,"\
    '\nno more an instruction!,\tyou know ?? "Cases" & and surprises:'\
    "that will 'lways unknown **before**, in 81% of time$"


a = re.split('(\W+)', s)
print a
print len(a)
print

b = re.split('(\s+)', s)
print b
print len(b)

产生

['With', ' ', 'a', ' ', 'sign', ' # ', 'written', ' @ ', 'the', ' ', 'beginning', ' , ', 'that', "'", 's', '  ', 'a', '\n', 'sentence', ',\n', 'no', ' ', 'more', ' ', 'an', ' ', 'instruction', '!,\t', 'you', ' ', 'know', ' ?? "', 'Cases', '" & ', 'and', ' ', 'surprises', ':', 'that', ' ', 'will', " '", 'lways', ' ', 'unknown', ' **', 'before', '**, ', 'in', ' ', '81', '% ', 'of', ' ', 'time', '
, '']
57

['With', ' ', 'a', ' ', 'sign', ' ', '#', ' ', 'written', ' ', '@', ' ', 'the', ' ', 'beginning', ' ', ',', ' ', "that's", '  ', 'a', '\n', 'sentence,', '\n', 'no', ' ', 'more', ' ', 'an', ' ', 'instruction!,', '\t', 'you', ' ', 'know', ' ', '??', ' ', '"Cases"', ' ', '&', ' ', 'and', ' ', 'surprises:that', ' ', 'will', ' ', "'lways", ' ', 'unknown', ' ', '**before**,', ' ', 'in', ' ', '81%', ' ', 'of', ' ', 'time
]
61

Symbol of whitespace in re is '\s' not '\W'

Compare:

import re


s = "With a sign # written @ the beginning , that's  a\nsentence,"\
    '\nno more an instruction!,\tyou know ?? "Cases" & and surprises:'\
    "that will 'lways unknown **before**, in 81% of time$"


a = re.split('(\W+)', s)
print a
print len(a)
print

b = re.split('(\s+)', s)
print b
print len(b)

produces

['With', ' ', 'a', ' ', 'sign', ' # ', 'written', ' @ ', 'the', ' ', 'beginning', ' , ', 'that', "'", 's', '  ', 'a', '\n', 'sentence', ',\n', 'no', ' ', 'more', ' ', 'an', ' ', 'instruction', '!,\t', 'you', ' ', 'know', ' ?? "', 'Cases', '" & ', 'and', ' ', 'surprises', ':', 'that', ' ', 'will', " '", 'lways', ' ', 'unknown', ' **', 'before', '**, ', 'in', ' ', '81', '% ', 'of', ' ', 'time', '
, '']
57

['With', ' ', 'a', ' ', 'sign', ' ', '#', ' ', 'written', ' ', '@', ' ', 'the', ' ', 'beginning', ' ', ',', ' ', "that's", '  ', 'a', '\n', 'sentence,', '\n', 'no', ' ', 'more', ' ', 'an', ' ', 'instruction!,', '\t', 'you', ' ', 'know', ' ', '??', ' ', '"Cases"', ' ', '&', ' ', 'and', ' ', 'surprises:that', ' ', 'will', ' ', "'lways", ' ', 'unknown', ' ', '**before**,', ' ', 'in', ' ', '81%', ' ', 'of', ' ', 'time
]
61

回复收藏 0 原文

猛虎独行 2024-11-13 00:46:19

试试这个：

re.split('(\W+)','this is  a\nsentence')

Try this:

re.split('(\W+)','this is  a\nsentence')

回复收藏 0 原文

~没有更多了~

关于作者

予囚

暂无简介

0 文章

0 评论

22 人气

关注发私信

lorenzathorton8

文章 0 评论 0

关注

Zero

文章 0 评论 0

关注

萧瑟寒风

文章 0 评论 0

关注

mylayout

文章 0 评论 0

关注

tkewei

文章 0 评论 0

关注

17818769742

文章 0 评论 0

友情链接

文江博客

在Python中通过正则表达式对字符串进行分区

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

在Python中通过正则表达式对字符串进行分区

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。