Python 中单独的数字/字母标记
我使用 re.split()
将字符串分成标记。目前我用作参数的模式是 [^\dA-Za-z]
,它从字符串中检索字母数字标记。
然而,我需要的是将同时具有数字和字母的标记拆分为仅具有其中之一的标记,例如。
re.split(pattern, "my t0kens")
将返回["my", "t", "0", "kens"]
。
我猜我可能需要使用前瞻/后瞻,但我不确定这是否真的有必要,或者是否有更好的方法来做到这一点。
I'm using re.split()
to separate a string into tokens. Currently the pattern I'm using as the argument is [^\dA-Za-z]
, which retrieves alphanumeric tokens from the string.
However, what I need is to also split tokens that have both numbers and letters into tokens with only one or the other, eg.
re.split(pattern, "my t0kens")
would return ["my", "t", "0", "kens"]
.
I'm guessing I might need to use lookahead/lookbehind, but I'm not sure if that's actually necessary or if there's a better way to do it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
并不完美,但从下面的列表中删除空格很容易:-)
docs:“按模式的出现次数分割字符串。如果在模式中使用捕获括号,则模式中所有组的文本也会作为结果列表的一部分返回。”
Not perfect, but removing space from the list below is easy :-)
docs: "Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list."
请尝试使用 findall 方法。
编辑:从下面巴特的评论中得到更好的方法。
Try the findall method instead.
Edit: Better way from Bart's comment below.
通过在模式中使用捕获括号,标记也将被返回。由于您只想保留数字而不是空格,因此我将
\s
留在括号之外,因此返回None
,然后可以使用简单的循环将其过滤掉。By using capturing parenthesis within the pattern, the tokens will also be return. Since you only want to maintain digits and not the spaces, I've left the
\s
outside the parenthesis soNone
is returned which can then be filtered out using a simple loop.应该是一行代码
Should be one line of code