Python中电子邮件列表的高效处理

发布于 2024-12-02 08:06:19 字数 769 浏览 1 评论 0原文

我有一个很长的电子邮件列表，我想对其进行处理：

将好电子邮件与坏电子邮件分开，并
删除重复项，但将所有非重复项保持相同的顺序。

这就是我到目前为止所拥有的：

email_list = ["[email protected]", "invalid_email", ...]
email_set = set()
bad_emails = []
good_emails = []
dups = False
for email in email_list:
    if email in email_set:
        dups = True
        continue
    email_set.add(email)
    if email_re.match(email):
        good_emails.append(email)
    else:
        bad_emails.append(email)

我希望这段代码尽可能快，并且不太重要，以最大限度地减少内存需求。有没有办法在Python中改进这个？也许使用列表理解或迭代器？

编辑：抱歉！忘记提及这是 Python 2.5，因为这是针对 GAE 的。

email_re 来自 django.core.validators

原文

I have a very long list of emails that I would like to process to:

separate good emails from bad emails, and
remove duplicates but keep all the non-duplicates in the same order.

This is what I have so far:

email_list = ["[email protected]", "invalid_email", ...]
email_set = set()
bad_emails = []
good_emails = []
dups = False
for email in email_list:
    if email in email_set:
        dups = True
        continue
    email_set.add(email)
    if email_re.match(email):
        good_emails.append(email)
    else:
        bad_emails.append(email)

I would like this chunk of code to be as fast as possible, and of less importance, to minimize memory requirements. Is there a way to improve this in Python? Maybe using list comprehensions or iterators?

EDIT: Sorry! Forget to mention that this is Python 2.5 since this is for GAE.

email_re is from django.core.validators

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

深居我梦 2024-12-09 08:06:19

查看： Python 有有序集吗？，然后选择您的实现喜欢。

所以只是：

email_list = OrderedSet(["[email protected]", "invalid_email", ...])

bad_emails = [] 
good_emails = []

for email in email_list:
    if email_re.match(email):
        good_emails.append(email)
    else:
        bad_emails.append(email)

可能是您可以实现的最快、最简单的解决方案。

Look at: Does Python have an ordered set? , and select an implementation you like.

So just:

email_list = OrderedSet(["[email protected]", "invalid_email", ...])

bad_emails = [] 
good_emails = []

for email in email_list:
    if email_re.match(email):
        good_emails.append(email)
    else:
        bad_emails.append(email)

Probably is the fastest and simpliest solution you can achieve.

回复收藏 0 原文