在Python中删除数字(正则表达式)

发布于 2024-07-18 06:26:00 字数 294 浏览 10 评论 0原文

我正在尝试删除字符串中的所有数字。 然而,下一个代码也会删除任何单词中包含的数字,显然我不希望这样。 我尝试了很多正则表达式但没有成功。

谢谢!


s = "This must not b3 delet3d, but the number at the end yes 134411"
s = re.sub("\d+", "", s)
print s

结果:

这个一定不能删,但是最后的数字可以

I'm trying to delete all digits from a string.
However the next code deletes as well digits contained in any word, and obviously I don't want that.
I've been trying many regular expressions with no success.

Thanks!


s = "This must not b3 delet3d, but the number at the end yes 134411"
s = re.sub("\d+", "", s)
print s

Result:

This must not b deletd, but the number at the end yes

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

上课铃就是安魂曲 2024-07-25 06:26:00

在 \d+ 之前添加一个空格。

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> s = re.sub(" \d+", " ", s)
>>> s
'This must not b3 delet3d, but the number at the end yes '

编辑:看了评论后,我决定形成一个更完整的答案。 我认为这可以解释所有情况。

s = re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", s)

Add a space before the \d+.

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> s = re.sub(" \d+", " ", s)
>>> s
'This must not b3 delet3d, but the number at the end yes '

Edit: After looking at the comments, I decided to form a more complete answer. I think this accounts for all the cases.

s = re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", s)
纸短情长 2024-07-25 06:26:00

试试这个:

"\b\d+\b"

这只会匹配那些不属于另一个单词的数字。

Try this:

"\b\d+\b"

That'll match only those digits that are not part of another word.

旧梦荧光笔 2024-07-25 06:26:00

使用 \s 不太好,因为它不处理制表符等。 更好的解决方案的第一个切入点是:

re.sub(r"\b\d+\b", "", s)

请注意,模式是原始字符串,因为 \b 通常是字符串的退格转义,而我们需要特殊的单词边界正则表达式转义。 一个稍微花哨的版本是:

re.sub(r"$\d+\W+|\b\d+\b|\W+\d+$", "", s)

当字符串的开头/结尾有数字时,尝试删除前导/尾随空格。 我说“尝试”是因为如果末尾有多个数字,那么仍然有一些空格。

Using \s isn't very good, since it doesn't handle tabs, et al. A first cut at a better solution is:

re.sub(r"\b\d+\b", "", s)

Note that the pattern is a raw string because \b is normally the backspace escape for strings, and we want the special word boundary regex escape instead. A slightly fancier version is:

re.sub(r"$\d+\W+|\b\d+\b|\W+\d+$", "", s)

That tries to remove leading/trailing whitespace when there are digits at the beginning/end of the string. I say "tries" because if there are multiple numbers at the end then you still have some spaces.

眉目亦如画i 2024-07-25 06:26:00

要处理行开头的数字字符串:

s = re.sub(r"(^|\W)\d+", "", s)

To handle digit strings at the beginning of a line as well:

s = re.sub(r"(^|\W)\d+", "", s)
老街孤人 2024-07-25 06:26:00

你可以尝试这个

s = "This must not b3 delet3d, but the number at the end yes 134411"
re.sub("(\s\d+)","",s) 

结果:

'This must not b3 delet3d, but the number at the end yes'

同样的规则也适用于

s = "This must not b3 delet3d, 4566 but the number at the end yes 134411" 
re.sub("(\s\d+)","",s) 

结果:

'This must not b3 delet3d, but the number at the end yes'

You could try this

s = "This must not b3 delet3d, but the number at the end yes 134411"
re.sub("(\s\d+)","",s) 

result:

'This must not b3 delet3d, but the number at the end yes'

the same rule also applies to

s = "This must not b3 delet3d, 4566 but the number at the end yes 134411" 
re.sub("(\s\d+)","",s) 

result:

'This must not b3 delet3d, but the number at the end yes'
仅冇旳回忆 2024-07-25 06:26:00

仅匹配字符串中的纯整数:

\b(?<![0-9-])(\d+)(?![0-9-])\b

它对此做了正确的事情,仅匹配百万之后的所有内容:

max-3 cvd-19 agent-007 8-zoo 2ab c3d ef4 55g h66i jk77 
8m9n o0p2     million     0 22 333  4444

此页面上的所有其他 8 个正则表达式答案都以各种方式失败。

第一个 0-9 ... [0-9-] ... 末尾的破折号保留 -007,第二组中的破折号保留 8-。

或者如果您愿意,可以用 \d 代替 0-9

在 regex101
输入图片此处描述

可以简化吗?

To match only pure integers in a string:

\b(?<![0-9-])(\d+)(?![0-9-])\b

It does the right thing with this, matching only everything after million:

max-3 cvd-19 agent-007 8-zoo 2ab c3d ef4 55g h66i jk77 
8m9n o0p2     million     0 22 333  4444

All of the other 8 regex answers on this page fail in various ways with that input.

The dash at the end by that first 0-9 ... [0-9-] ... preserves -007 and the dash in the second set preserves 8-.

Or \d in place of 0-9 if you prefer

at regex101
enter image description here

Can it be simplified?

情绪操控生活 2024-07-25 06:26:00

我不知道你的真实情况是什么样的,但大多数答案看起来都不会处理负数或小数,

re.sub(r"(\b|\s+\-?|^\ -?)(\d+|\d*\.\d+)\b","")

上面还应该处理这样的事情,

“这不能是 b3 delet3d,但末尾的数字是 -134.411 “

但这仍然不完整 - 您可能需要对您需要解析的文件中期望找到的内容有一个更完整的定义。

编辑:还值得注意的是,“\b”会根据您使用的区域设置/字符集而变化,因此您需要对此小心一些。

I don't know what your real situation looks like, but most of the answers look like they won't handle negative numbers or decimals,

re.sub(r"(\b|\s+\-?|^\-?)(\d+|\d*\.\d+)\b","")

The above should also handle things like,

"This must not b3 delet3d, but the number at the end yes -134.411"

But this is still incomplete - you probably need a more complete definition of what you can expect to find in the files you need to parse.

Edit: it's also worth noting that '\b' changes depending on the locale/character set you are using so you need to be a little careful with that.

时间海 2024-07-25 06:26:00

如果您的数字始终位于字符串末尾,请尝试:

re.sub("\d+$", "", s)

否则,您可以尝试

re.sub("(\s)\d+(\s)", "\1\2", s)

您可以调整反向引用以仅保留一两个空格(\s 匹配任何白色分隔符)

If your number is allways at the end of your strings try :

re.sub("\d+
quot;, "", s)

otherwise, you may try

re.sub("(\s)\d+(\s)", "\1\2", s)

You can adjust the back-references to keep only one or two of the spaces (\s match any white separator)

倒数 2024-07-25 06:26:00

非正则表达式解决方案:

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> " ".join([x for x in s.split(" ") if not x.isdigit()])
'This must not b3 delet3d, but the number at the end yes'

" " 分割,并通过执行 str().isdigit(),然后将它们重新连接在一起。 更详细(不使用列表理解):

words = s.split(" ")
non_digits = []
for word in words:
    if not word.isdigit():
        non_digits.append(word)

" ".join(non_digits)

Non-regex solution:

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> " ".join([x for x in s.split(" ") if not x.isdigit()])
'This must not b3 delet3d, but the number at the end yes'

Splits by " ", and checks if the chunk is a number by doing str().isdigit(), then joins them back together. More verbosely (not using a list comprehension):

words = s.split(" ")
non_digits = []
for word in words:
    if not word.isdigit():
        non_digits.append(word)

" ".join(non_digits)
时光暖心i 2024-07-25 06:26:00

我灵光一现,我尝试了一下,它起作用了:

sol = re.sub(r'[~^0-9]', '', 'aas30dsa20')

输出:

aasdsa

I had a light-bulb moment, I tried and it works:

sol = re.sub(r'[~^0-9]', '', 'aas30dsa20')

output:

aasdsa
笑咖 2024-07-25 06:26:00
>>>s = "This must not b3 delet3d, but the number at the end yes 134411"
>>>s = re.sub(r"\d*$", "", s)
>>>s

“这不能是b3 delet3d,但末尾的数字是”

这将删除字符串末尾的数字。

>>>s = "This must not b3 delet3d, but the number at the end yes 134411"
>>>s = re.sub(r"\d*$", "", s)
>>>s

"This must not b3 delet3d, but the number at the end yes "

This will remove the numericals at the end of the string.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文