在Python中删除数字(正则表达式)
我正在尝试删除字符串中的所有数字。 然而,下一个代码也会删除任何单词中包含的数字,显然我不希望这样。 我尝试了很多正则表达式但没有成功。
谢谢!
s = "This must not b3 delet3d, but the number at the end yes 134411"
s = re.sub("\d+", "", s)
print s
结果:
这个一定不能删,但是最后的数字可以
I'm trying to delete all digits from a string.
However the next code deletes as well digits contained in any word, and obviously I don't want that.
I've been trying many regular expressions with no success.
Thanks!
s = "This must not b3 delet3d, but the number at the end yes 134411"
s = re.sub("\d+", "", s)
print s
Result:
This must not b deletd, but the number at the end yes
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
在 \d+ 之前添加一个空格。
编辑:看了评论后,我决定形成一个更完整的答案。 我认为这可以解释所有情况。
Add a space before the \d+.
Edit: After looking at the comments, I decided to form a more complete answer. I think this accounts for all the cases.
试试这个:
这只会匹配那些不属于另一个单词的数字。
Try this:
That'll match only those digits that are not part of another word.
使用
\s
不太好,因为它不处理制表符等。 更好的解决方案的第一个切入点是:请注意,模式是原始字符串,因为
\b
通常是字符串的退格转义,而我们需要特殊的单词边界正则表达式转义。 一个稍微花哨的版本是:当字符串的开头/结尾有数字时,尝试删除前导/尾随空格。 我说“尝试”是因为如果末尾有多个数字,那么仍然有一些空格。
Using
\s
isn't very good, since it doesn't handle tabs, et al. A first cut at a better solution is:Note that the pattern is a raw string because
\b
is normally the backspace escape for strings, and we want the special word boundary regex escape instead. A slightly fancier version is:That tries to remove leading/trailing whitespace when there are digits at the beginning/end of the string. I say "tries" because if there are multiple numbers at the end then you still have some spaces.
要处理行开头的数字字符串:
To handle digit strings at the beginning of a line as well:
你可以尝试这个
结果:
同样的规则也适用于
结果:
You could try this
result:
the same rule also applies to
result:
仅匹配字符串中的纯整数:
它对此做了正确的事情,仅匹配百万之后的所有内容:
此页面上的所有其他 8 个正则表达式答案都以各种方式失败。
第一个 0-9 ... [0-9-] ... 末尾的破折号保留 -007,第二组中的破折号保留 8-。
或者如果您愿意,可以用 \d 代替 0-9
在 regex101
可以简化吗?
To match only pure integers in a string:
It does the right thing with this, matching only everything after million:
All of the other 8 regex answers on this page fail in various ways with that input.
The dash at the end by that first 0-9 ... [0-9-] ... preserves -007 and the dash in the second set preserves 8-.
Or \d in place of 0-9 if you prefer
at regex101
Can it be simplified?
我不知道你的真实情况是什么样的,但大多数答案看起来都不会处理负数或小数,
re.sub(r"(\b|\s+\-?|^\ -?)(\d+|\d*\.\d+)\b","")
上面还应该处理这样的事情,
“这不能是 b3 delet3d,但末尾的数字是 -134.411 “
但这仍然不完整 - 您可能需要对您需要解析的文件中期望找到的内容有一个更完整的定义。
编辑:还值得注意的是,“\b”会根据您使用的区域设置/字符集而变化,因此您需要对此小心一些。
I don't know what your real situation looks like, but most of the answers look like they won't handle negative numbers or decimals,
re.sub(r"(\b|\s+\-?|^\-?)(\d+|\d*\.\d+)\b","")
The above should also handle things like,
"This must not b3 delet3d, but the number at the end yes -134.411"
But this is still incomplete - you probably need a more complete definition of what you can expect to find in the files you need to parse.
Edit: it's also worth noting that '\b' changes depending on the locale/character set you are using so you need to be a little careful with that.
如果您的数字始终位于字符串末尾,请尝试:
否则,您可以尝试
您可以调整反向引用以仅保留一两个空格(
\s
匹配任何白色分隔符)If your number is allways at the end of your strings try :
otherwise, you may try
You can adjust the back-references to keep only one or two of the spaces (
\s
match any white separator)非正则表达式解决方案:
按
" "
分割,并通过执行str().isdigit()
,然后将它们重新连接在一起。 更详细(不使用列表理解):Non-regex solution:
Splits by
" "
, and checks if the chunk is a number by doingstr().isdigit()
, then joins them back together. More verbosely (not using a list comprehension):我灵光一现,我尝试了一下,它起作用了:
输出:
I had a light-bulb moment, I tried and it works:
output:
“这不能是b3 delet3d,但末尾的数字是”
这将删除字符串末尾的数字。
"This must not b3 delet3d, but the number at the end yes "
This will remove the numericals at the end of the string.