删除包含带有python上字母的数字的行

发布于 2025-02-02 15:35:35 字数 309 浏览 6 评论 0原文

我有一个 txt 文件,每行包含一个句子,并且有包含字母附加的数字的行。例如:

The boy3 was strolling on the beach while four seagulls appeared flying.
There were 3 women sunbathing as well.
All children were playing happily.

我希望删除像第一条线( IE 具有数字粘在单词上的线路)一样,而不是像第二个线一样正确编写的行。

有人有一个想法吗?

I have a txt file containing one sentence per line, and there are lines containing numbers attached to letters. For instance:

The boy3 was strolling on the beach while four seagulls appeared flying.
There were 3 women sunbathing as well.
All children were playing happily.

I would like remove lines like the first one (i.e. having numbers stuck to words) but not lines like the second which are properly written.

Has anybody got a slight idea?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

女皇必胜 2025-02-09 15:35:35

您可以使用简单的正则方式。我们从[0-9]+开始。该模式检测到任何数字0-9无限期的次数。含义6或56或56790作品。如果要检测带有数字附加到字符串的句子,则可以使用类似的内容:([A-ZA-Z] [0-9]+)|([0-9]+[A-ZA -z])此Regex字符串在数字之前或数字之后与字符串匹配。您可以使用:

import re

lines = [
    'The boy3 was strolling on the beach while 4 seagulls appeared flying.',
    'There were 3 women sunbathing as well.',
]

for line in lines:
    res = re.search("([a-zA-Z][0-9]+)|([0-9]+[a-zA-Z])", line)
    if res is None:
        # remove line

但是,如果您的句子包含特殊字符等,则可以在允许的字母中添加更多字符。

You can use a simple regex pattern. We start with [0-9]+. This pattern detects any number 0-9 an indefinite amounts of times. Meaning 6, or 56, or 56790 works. If you want to detect sentences that have numbers attached to a string you could use something like this: ([a-zA-Z][0-9]+)|([0-9]+[a-zA-Z]) This regex string matches a string with a letter before a number or after a number. You can search strings using:

import re

lines = [
    'The boy3 was strolling on the beach while 4 seagulls appeared flying.',
    'There were 3 women sunbathing as well.',
]

for line in lines:
    res = re.search("([a-zA-Z][0-9]+)|([0-9]+[a-zA-Z])", line)
    if res is None:
        # remove line

However you can add more characters to the allowed letters if your sentences can include special characters and such.

无需解释 2025-02-09 15:35:35

假设,您的输入文本存储在文件in.txt中,您可以使用以下代码:

import re

with open("in.txt", "r") as f:
    for line in f:
        if not(re.search(r'(?!\d)[\w]\d|\d(?!\d)[\w]', line, flags=re.UNICODE)):
               print(line, end="")

taters (?!\ d)[\ w]寻找Word Carne \ w)排除数字。这个想法是从

Suppose, your input text is stored in file in.txt, you can use following code:

import re

with open("in.txt", "r") as f:
    for line in f:
        if not(re.search(r'(?!\d)[\w]\d|\d(?!\d)[\w]', line, flags=re.UNICODE)):
               print(line, end="")

The pattern (?!\d)[\w] looks for word characters (\w) excluding digits. The idea is stolen from https://stackoverflow.com/a/12349464/2740367

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文