删除包含带有python上字母的数字的行
我有一个 txt 文件,每行包含一个句子,并且有包含字母附加的数字的行。例如:
The boy3 was strolling on the beach while four seagulls appeared flying.
There were 3 women sunbathing as well.
All children were playing happily.
我希望删除像第一条线( IE 具有数字粘在单词上的线路)一样,而不是像第二个线一样正确编写的行。
有人有一个想法吗?
I have a txt file containing one sentence per line, and there are lines containing numbers attached to letters. For instance:
The boy3 was strolling on the beach while four seagulls appeared flying.
There were 3 women sunbathing as well.
All children were playing happily.
I would like remove lines like the first one (i.e. having numbers stuck to words) but not lines like the second which are properly written.
Has anybody got a slight idea?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用简单的正则方式。我们从
[0-9]+
开始。该模式检测到任何数字0-9无限期的次数。含义6或56或56790作品。如果要检测带有数字附加到字符串的句子,则可以使用类似的内容:([A-ZA-Z] [0-9]+)|([0-9]+[A-ZA -z])
此Regex字符串在数字之前或数字之后与字符串匹配。您可以使用:但是,如果您的句子包含特殊字符等,则可以在允许的字母中添加更多字符。
You can use a simple regex pattern. We start with
[0-9]+
. This pattern detects any number 0-9 an indefinite amounts of times. Meaning 6, or 56, or 56790 works. If you want to detect sentences that have numbers attached to a string you could use something like this:([a-zA-Z][0-9]+)|([0-9]+[a-zA-Z])
This regex string matches a string with a letter before a number or after a number. You can search strings using:However you can add more characters to the allowed letters if your sentences can include special characters and such.
假设,您的输入文本存储在文件
in.txt
中,您可以使用以下代码:taters
(?!\ d)[\ w]
寻找Word Carne\ w
)排除数字。这个想法是从Suppose, your input text is stored in file
in.txt
, you can use following code:The pattern
(?!\d)[\w]
looks for word characters (\w
) excluding digits. The idea is stolen from https://stackoverflow.com/a/12349464/2740367