在 Python 中标记一个保留分隔符的字符串

发布于 2024-08-12 16:53:17 字数 325 浏览 6 评论 0原文

Python 中是否有与 str.split 等效的东西也返回分隔符？

在处理一些标记后，我需要保留输出的空白布局。

示例：

>>> s="\tthis is an  example"
>>> print s.split()
['this', 'is', 'an', 'example']

>>> print what_I_want(s)
['\t', 'this', ' ', 'is', ' ', 'an', '  ', 'example']

谢谢！

原文

Is there any equivalent to str.split in Python that also returns the delimiters?

I need to preserve the whitespace layout for my output after processing some of the tokens.

Example:

>>> s="\tthis is an  example"
>>> print s.split()
['this', 'is', 'an', 'example']

>>> print what_I_want(s)
['\t', 'this', ' ', 'is', ' ', 'an', '  ', 'example']

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

忆伤 2024-08-19 16:53:18

>>> re.compile(r'(\s+)').split("\tthis is an  example")
['', '\t', 'this', ' ', 'is', ' ', 'an', '  ', 'example']

>>> re.compile(r'(\s+)').split("\tthis is an  example")
['', '\t', 'this', ' ', 'is', ' ', 'an', '  ', 'example']

回复收藏 0 原文

蓝海似她心 2024-08-19 16:53:18

re 模块提供了此功能：（

>>> import re
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']

引自 Python 文档）。

对于您的示例（按空格分割），请使用 re.split('(\s+)', '\tThis is an example')。

关键是将要分割的正则表达式括在捕获括号中。这样，分隔符就会添加到结果列表中。

编辑：正如所指出的，任何前置/尾随分隔符当然也会添加到列表中。为了避免这种情况，您可以首先在输入字符串上使用 .strip() 方法。

the re module provides this functionality:

>>> import re
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']

(quoted from the Python documentation).

For your example (split on whitespace), use re.split('(\s+)', '\tThis is an example').

The key is to enclose the regex on which to split in capturing parentheses. That way, the delimiters are added to the list of results.

Edit: As pointed out, any preceding/trailing delimiters will of course also be added to the list. To avoid that you can use the .strip() method on your input string first.

回复收藏 0 原文

单调的奢华 2024-08-19 16:53:18

你看过 pyparsing 吗？借自 pyparsing wiki 的示例：

>>> from pyparsing import Word, alphas
>>> greet = Word(alphas) + "," + Word(alphas) + "!"
>>> hello1 = 'Hello, World!'
>>> hello2 = 'Greetings, Earthlings!'
>>> for hello in hello1, hello2:
...     print (u'%s \u2192 %r' % (hello, greet.parseString(hello))).encode('utf-8')
... 
Hello, World! → (['Hello', ',', 'World', '!'], {})
Greetings, Earthlings! → (['Greetings', ',', 'Earthlings', '!'], {})

Have you looked at pyparsing? Example borrowed from the pyparsing wiki:

>>> from pyparsing import Word, alphas
>>> greet = Word(alphas) + "," + Word(alphas) + "!"
>>> hello1 = 'Hello, World!'
>>> hello2 = 'Greetings, Earthlings!'
>>> for hello in hello1, hello2:
...     print (u'%s \u2192 %r' % (hello, greet.parseString(hello))).encode('utf-8')
... 
Hello, World! → (['Hello', ',', 'World', '!'], {})
Greetings, Earthlings! → (['Greetings', ',', 'Earthlings', '!'], {})

回复收藏 0 原文

她如夕阳 2024-08-19 16:53:18

感谢大家指出 re 模块，我仍在尝试在它和使用我自己的返回序列的函数之间做出决定...

def split_keep_delimiters(s, delims="\t\n\r "):
    delim_group = s[0] in delims
    start = 0
    for index, char in enumerate(s):
        if delim_group != (char in delims):
            delim_group ^= True
            yield s[start:index]
            start = index
    yield s[start:index+1]

如果我有时间，我会对它们进行基准测试 xD

Thanks guys for pointing for the re module, I'm still trying to decide between that and using my own function that returns a sequence...

def split_keep_delimiters(s, delims="\t\n\r "):
    delim_group = s[0] in delims
    start = 0
    for index, char in enumerate(s):
        if delim_group != (char in delims):
            delim_group ^= True
            yield s[start:index]
            start = index
    yield s[start:index+1]

If I had time I'd benchmark them xD

回复收藏 0 原文

夏有森光若流苏 2024-08-19 16:53:17

怎么样

import re
splitter = re.compile(r'(\s+|\S+)')
splitter.findall(s)

How about

import re
splitter = re.compile(r'(\s+|\S+)')
splitter.findall(s)

回复收藏 0 原文

~没有更多了~

关于作者

此刻的回忆

暂无简介

文章

26 人气

关注发私信

琉璃梦幻

文章 0 评论 0

关注

qq_4zWU6L

文章 0 评论 0

关注

话少情深

文章 0 评论 0

关注

西西弗的石头怪

文章 0 评论 0

关注

彻夜缠绵

文章 0 评论 0

关注

千寻…

文章 0 评论 0

友情链接

文江博客

在 Python 中标记一个保留分隔符的字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

琉璃梦幻

qq_4zWU6L

话少情深

西西弗的石头怪

彻夜缠绵

千寻…

友情链接

在 Python 中标记一个保留分隔符的字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

琉璃梦幻

qq_4zWU6L

话少情深

西西弗的石头怪

彻夜缠绵

千寻…

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。