Python中电子邮件列表的高效处理

发布于 2024-12-02 08:06:19 字数 769 浏览 1 评论 0原文

我有一个很长的电子邮件列表,我想对其进行处理:

  1. 将好电子邮件与坏电子邮件分开,并
  2. 删除重复项,但将所有非重复项保持相同的顺序。

这就是我到目前为止所拥有的:

email_list = ["[email protected]", "invalid_email", ...]
email_set = set()
bad_emails = []
good_emails = []
dups = False
for email in email_list:
    if email in email_set:
        dups = True
        continue
    email_set.add(email)
    if email_re.match(email):
        good_emails.append(email)
    else:
        bad_emails.append(email)

我希望这段代码尽可能快,并且不太重要,以最大限度地减少内存需求。有没有办法在Python中改进这个?也许使用列表理解或迭代器?

编辑:抱歉!忘记提及这是 Python 2.5,因为这是针对 GAE 的。

email_re 来自 django.core.validators

I have a very long list of emails that I would like to process to:

  1. separate good emails from bad emails, and
  2. remove duplicates but keep all the non-duplicates in the same order.

This is what I have so far:

email_list = ["[email protected]", "invalid_email", ...]
email_set = set()
bad_emails = []
good_emails = []
dups = False
for email in email_list:
    if email in email_set:
        dups = True
        continue
    email_set.add(email)
    if email_re.match(email):
        good_emails.append(email)
    else:
        bad_emails.append(email)

I would like this chunk of code to be as fast as possible, and of less importance, to minimize memory requirements. Is there a way to improve this in Python? Maybe using list comprehensions or iterators?

EDIT: Sorry! Forget to mention that this is Python 2.5 since this is for GAE.

email_re is from django.core.validators

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

深居我梦 2024-12-09 08:06:19

查看: Python 有有序集吗? ,然后选择您的实现喜欢。

所以只是:

email_list = OrderedSet(["[email protected]", "invalid_email", ...])

bad_emails = [] 
good_emails = []

for email in email_list:
    if email_re.match(email):
        good_emails.append(email)
    else:
        bad_emails.append(email)

可能是您可以实现的最快、最简单的解决方案。

Look at: Does Python have an ordered set? , and select an implementation you like.

So just:

email_list = OrderedSet(["[email protected]", "invalid_email", ...])

bad_emails = [] 
good_emails = []

for email in email_list:
    if email_re.match(email):
        good_emails.append(email)
    else:
        bad_emails.append(email)

Probably is the fastest and simpliest solution you can achieve.

花之痕靓丽 2024-12-09 08:06:19

我想不出任何方法可以加快你的速度。使用 set 来跟踪事物的速度很快,使用 list 存储列表的速度也很快。

我喜欢 OrderedSet 解决方案,但我怀疑 OrderedSet 的 Python 实现会比您编写的更快。

您可以使用 OrderedDict 来解决这个问题。但这是为 Python 2.7 添加的。您可以使用食谱(例如:http://code.activestate.com/recipes/576693/)添加 OrderedDict 但我再次认为它不会比你拥有的更快。

我正在尝试考虑用 C 实现的 Python 模块来解决这个问题。我认为这是击败你的代码的唯一希望。但我什么也没想。

如果您可以摆脱 dups 标志,只需运行更少的 Python 代码就会更快。

有趣的问题。祝你好运。

I can't think of any way to speed up what you have. It's fast to use a set to keep track of things, and it's fast to use a list to store a list.

I like the OrderedSet solution, but I doubt a Python implementation of OrderedSet would be faster than what you wrote.

You could use an OrderedDict to solve this problem. But that was added for Python 2.7. You could use a recipe (like: http://code.activestate.com/recipes/576693/) to add OrderedDict but again I don't think it would be any faster than what you have.

I'm trying to think of a Python module that is implemented in C to solve this problem. I think that's the only hope of beating your code. But I haven't thought of anything.

If you can get rid of the dups flag, it will be faster simply by running less Python code.

Interesting question. Good luck.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文