解析电子邮件标题文本抄送字段的方法?

发布于 2024-10-25 23:29:32 字数 583 浏览 2 评论 0原文

我有一个抄送标头字段的纯文本,如下所示:

[电子邮件受保护],John Smith <[电子邮件受保护]>,"史密斯,简" [电子邮件受保护]> ?

是否有经过实战测试的模块可以正确解析此内容

(如果是在Python中,那就更好了!电子邮件模块只返回原始文本,没有任何分割它的方法,据我所知) (如果它将姓名和地址拆分为字段,也会有好处)

I have the plain text of a Cc header field that looks like so:


[email protected], John Smith <[email protected]>,"Smith, Jane" <[email protected]>

Are there any battle tested modules for parsing this properly?

(bonus if it's in python! the email module just returns the raw text without any methods for splitting it, AFAIK)
(also bonus if it splits name and address into to fields)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

方觉久 2024-11-01 23:29:32

有很多函数可用作标准 python 模块,但我认为您正在寻找
email.utils.parseaddr()email.utils.getaddresses()

>>> addresses = '[email protected], John Smith <[email protected]>,"Smith, Jane" <[email protected]>'
>>> email.utils.getaddresses([addresses])
[('', '[email protected]'), ('John Smith', '[email protected]'), ('Smith, Jane', '[email protected]')]

There are a bunch of function available as a standard python module, but I think you're looking for
email.utils.parseaddr() or email.utils.getaddresses()

>>> addresses = '[email protected], John Smith <[email protected]>,"Smith, Jane" <[email protected]>'
>>> email.utils.getaddresses([addresses])
[('', '[email protected]'), ('John Smith', '[email protected]'), ('Smith, Jane', '[email protected]')]
守不住的情 2024-11-01 23:29:32

我自己没有使用过它,但在我看来你可以使用 csv打包很容易解析数据。

I haven't used it myself, but it looks to me like you could use the csv package quite easily to parse the data.

╭ゆ眷念 2024-11-01 23:29:32

下面的内容完全没有必要。我在意识到您可以传递 getaddresses() 一个包含单个字符串(包含多个地址)的列表之前就写了它。

我还没有机会查看电子邮件标头中地址的规范,但根据您提供的字符串,此代码应该将其拆分为一个列表,并确保忽略引号内的逗号(因此是名称的一部分)。

from email.utils import getaddresses

addrstring = ',[email protected], John Smith <[email protected]>,"Smith, Jane" <[email protected]>,'

def addrparser(addrstring):
    addrlist = ['']
    quoted = False

    # ignore comma at beginning or end
    addrstring = addrstring.strip(',')

    for char in addrstring:
        if char == '"':
            # toggle quoted mode
            quoted = not quoted
            addrlist[-1] += char
        # a comma outside of quotes means a new address
        elif char == ',' and not quoted:
            addrlist.append('')
        # anything else is the next letter of the current address
        else:
            addrlist[-1] += char

    return getaddresses(addrlist)

print addrparser(addrstring)

给出:

[('', '[email protected]'), ('John Smith', '[email protected]'),
 ('Smith, Jane', '[email protected]')]

我很想知道其他人会如何解决这个问题!

The bellow is completely unnecessary. I wrote it before realising that you could pass getaddresses() a list containing a single string containing multiple addresses.

I haven't had a chance to look at the specifications for addresses in email headers, but based on the string you provided, this code should do the job splitting it into a list, making sure to ignore commas if they are within quotes (and therefore part of a name).

from email.utils import getaddresses

addrstring = ',[email protected], John Smith <[email protected]>,"Smith, Jane" <[email protected]>,'

def addrparser(addrstring):
    addrlist = ['']
    quoted = False

    # ignore comma at beginning or end
    addrstring = addrstring.strip(',')

    for char in addrstring:
        if char == '"':
            # toggle quoted mode
            quoted = not quoted
            addrlist[-1] += char
        # a comma outside of quotes means a new address
        elif char == ',' and not quoted:
            addrlist.append('')
        # anything else is the next letter of the current address
        else:
            addrlist[-1] += char

    return getaddresses(addrlist)

print addrparser(addrstring)

Gives:

[('', '[email protected]'), ('John Smith', '[email protected]'),
 ('Smith, Jane', '[email protected]')]

I'd be interested to see how other people would go about this problem!

晚风撩人 2024-11-01 23:29:32

将多个电子邮件字符串转换为字典(将多个带有名称的电子邮件转换为一个字符串)。

emailstring = 'Friends <[email protected]>, John Smith <[email protected]>,"Smith" <[email protected]>'

用逗号分割字符串

email_list = emailstring.split(',')

名称是键,电子邮件是值并制作字典。

email_dict = dict(map(lambda x: email.utils.parseaddr(x), email_list))

结果如下:

{'John Smith': '[email protected]', 'Friends': '[email protected]', 'Smith': '[email protected]'}

注意:

如果有相同的姓名和不同的电子邮件 ID,则跳过一条记录。

'Friends <[email protected]>, John Smith <[email protected]>,"Smith" <[email protected]>, Friends <[email protected]>'

《老友记》重复了两次。

Convert multiple E-mail string in to dictionary (Multiple E-Mail with name in to one string).

emailstring = 'Friends <[email protected]>, John Smith <[email protected]>,"Smith" <[email protected]>'

Split string by Comma

email_list = emailstring.split(',')

name is key and email is value and make dictionary.

email_dict = dict(map(lambda x: email.utils.parseaddr(x), email_list))

Result like this:

{'John Smith': '[email protected]', 'Friends': '[email protected]', 'Smith': '[email protected]'}

Note:

If there is same name with different email id then one record is skip.

'Friends <[email protected]>, John Smith <[email protected]>,"Smith" <[email protected]>, Friends <[email protected]>'

"Friends" is duplicate 2 time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文