在 Python 中,如何从列表中删除包含某些类型字符的任何元素?

发布于 2024-11-28 16:04:34 字数 1114 浏览 8 评论 0原文

抱歉,如果这是一个简单的问题,我对此还很陌生,但我花了一段时间寻找答案,但没有找到任何东西。我有一个看起来像这样可怕的混乱的列表:

['Organization name} ', '> (777) 777-7777} ', ' class="lsn-mB6 adr">1 Address, MA 02114 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4603114\'); ', 'Other organization} ', '> (555) 555-5555} ', ' class="lsn-mB6 adr">301 Address, MA 02121 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO CLAIM YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4715945\'); ', 'Organization} ']

我需要处理它,以便 HTML.py 可以将里面的信息变成表格。由于某种原因,HTML.py 根本无法处理怪物元素(例如 'class="lsn-serpListRadius lsn-fr">.2 Miles} 更多信息您的列表地图 if (typeof(serps) !== \ '未定义\')serps.arrArticleIds.push(\'4603114\')'等)。对我来说幸运的是,我实际上并不关心怪物元素中的信息,而是想摆脱它们。

我尝试编写一个正则表达式来匹配所有两个以上字母的全大写单词,以识别怪物元素,并得到了这个:

re.compile('[^a-z]*[A-Z][^a-z]*\w{3,}')

但我不知道如何将其应用于删除包含与该正则表达式匹配的元素从列表中。我该怎么做/这是正确的方法吗?

Apologies if this is a simple question, I'm still pretty new to this, but I've spent a while looking for an answer and haven't found anything. I have a list that looks something like this horrifying mess:

['Organization name} ', '> (777) 777-7777} ', ' class="lsn-mB6 adr">1 Address, MA 02114 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4603114\'); ', 'Other organization} ', '> (555) 555-5555} ', ' class="lsn-mB6 adr">301 Address, MA 02121 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO CLAIM YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4715945\'); ', 'Organization} ']

And I need to process it so that HTML.py can turn the information in it into a table. For some reason, HTML.py simply can't handle the monster elements (eg. 'class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4603114\'); ', etc). Fortunately for me, I don't actually care about the information in the monster elements and want to get rid of them.

I tried writing a regex that would match all more-than-two-letter all-caps words, to identify the monster elements, and got this:

re.compile('[^a-z]*[A-Z][^a-z]*\w{3,}')

But I don't know how to apply that to deleting the elements containing matches to that regex from the list. How would I do that/is that the right way to go about it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

遇到 2024-12-05 16:04:35

没有正则表达式

def isNotMonster(x):
    return not any((len(word) > 2) and (word == word.upper()) for word in x.split())

okay_items = filter(isNotMonster, all_items)

without regex

def isNotMonster(x):
    return not any((len(word) > 2) and (word == word.upper()) for word in x.split())

okay_items = filter(isNotMonster, all_items)
﹂绝世的画 2024-12-05 16:04:35
element = 'string_to_search'
for item in y_list_of_items:
    if element in item:
        y_list_of_items.remove(item)
element = 'string_to_search'
for item in y_list_of_items:
    if element in item:
        y_list_of_items.remove(item)
宁愿没拥抱 2024-12-05 16:04:34

我认为您的正则表达式不正确,要匹配包含三个或更多字符的全大写单词的所有条目,您应该在 re.search 中使用类似的内容:

regex = re.compile(r'\b[A-Z]{3,}\b')

这样您就可以使用列表理解进行过滤或 filter 内置函数:

full = ['Organization name} ', '> (777) 777-7777} ', ' class="lsn-mB6 adr">1 Address, MA 02114 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4603114\'); ', 'Other organization} ', '> (555) 555-5555} ', ' class="lsn-mB6 adr">301 Address, MA 02121 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO CLAIM YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4715945\'); ', 'Organization} ']
regex = re.compile(r'\b[A-Z]{3,}\b')
# use only one of the following lines, whichever you prefer
filtered = filter(lambda i: not regex.search(i), full)
filtered = [i for i in full if not regex.search(i)]

结果如下列表(我认为这就是您正在寻找的内容:

>>> pprint.pprint(filtered)
['Organization name} ',
 '> (777) 777-7777} ',
 ' class="lsn-mB6 adr">1 Address, MA 02114 } ',
 'Other organization} ',
 '> (555) 555-5555} ',
 ' class="lsn-mB6 adr">301 Address, MA 02121 } ',
 'Organization} ']

I think your regex is incorrect, to match all entries that contain all-cap words with three or more characters, you should use something like this with re.search:

regex = re.compile(r'\b[A-Z]{3,}\b')

With that you can filter using a list comprehension or the filter built-in function:

full = ['Organization name} ', '> (777) 777-7777} ', ' class="lsn-mB6 adr">1 Address, MA 02114 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4603114\'); ', 'Other organization} ', '> (555) 555-5555} ', ' class="lsn-mB6 adr">301 Address, MA 02121 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO CLAIM YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4715945\'); ', 'Organization} ']
regex = re.compile(r'\b[A-Z]{3,}\b')
# use only one of the following lines, whichever you prefer
filtered = filter(lambda i: not regex.search(i), full)
filtered = [i for i in full if not regex.search(i)]

Results in the following list (which I think is what you are looking for:

>>> pprint.pprint(filtered)
['Organization name} ',
 '> (777) 777-7777} ',
 ' class="lsn-mB6 adr">1 Address, MA 02114 } ',
 'Other organization} ',
 '> (555) 555-5555} ',
 ' class="lsn-mB6 adr">301 Address, MA 02121 } ',
 'Organization} ']
疯了 2024-12-05 16:04:34

首先,存储您的正则表达式,然后使用列表理解:

regex = re.compile('[^a-z]*[A-Z][^a-z]*\w{3,}')
okay_items = [x for x in all_items if not regex.match(x)]

First, store your regex, then use a list comprehension:

regex = re.compile('[^a-z]*[A-Z][^a-z]*\w{3,}')
okay_items = [x for x in all_items if not regex.match(x)]
妄司 2024-12-05 16:04:34

或者完全相同但不编译正则表达式:

from re import match

ll = ['Organization name} ', '> (777) 777-7777} ', ' class="lsn-mB6 adr">1 Address, MA 02114 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4603114\'); ', 'Other organization} ', '> (555) 555-5555} ', ' class="lsn-mB6 adr">301 Address, MA 02121 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO CLAIM YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4715945\'); ', 'Organization} ']

filteredData = [x for x in ll if not match(r'[^a-z]*[A-Z][^a-z]*\w{3,}', x)]

编辑:

from re import compile

rex = compile('[^a-z]*[A-Z][^a-z]*\w{3,}')
filteredData = [x for x in ll if not rex.match(x)]

Or the very same but without compiling regex:

from re import match

ll = ['Organization name} ', '> (777) 777-7777} ', ' class="lsn-mB6 adr">1 Address, MA 02114 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4603114\'); ', 'Other organization} ', '> (555) 555-5555} ', ' class="lsn-mB6 adr">301 Address, MA 02121 } ', ' class="lsn-serpListRadius lsn-fr">.2 Miles} MORE INFO CLAIM YOUR LISTING MAP if (typeof(serps) !== \'undefined\') serps.arrArticleIds.push(\'4715945\'); ', 'Organization} ']

filteredData = [x for x in ll if not match(r'[^a-z]*[A-Z][^a-z]*\w{3,}', x)]

Edited:

from re import compile

rex = compile('[^a-z]*[A-Z][^a-z]*\w{3,}')
filteredData = [x for x in ll if not rex.match(x)]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文