在Python中解析多部分电子邮件并保存附件

发布于 2024-11-14 02:21:14 字数 730 浏览 3 评论 0原文

我对 python 很陌生,我正在尝试通过 python 的 imaplib 和电子邮件解析来自 gmail 的电子邮件。它工作得很好,但我在电子邮件附件方面遇到问题。

我想解析电子邮件中的所有纯文本,同时忽略可能作为辅助内容类型插入的任何 HTML,同时删除并保存所有其他附件。我一直在尝试以下操作:

...imaplib connection and mailbox selection...

typ, msg_data = c.fetch(num, '(RFC822)')
        email_body = msg_data[0][1]
mail = email.message_from_string(email_body)
        for part in mail.walk():
            if part.get_content_type() == 'text/plain':
                body = body + '\n' + part.get_payload()
            else:
                continue

这是我最初的尝试,只是获取电子邮件的纯文本部分,但是当有人发送带有文本附件的电子邮件时,文本文件的内容会显示在上面的“body”变量中。

有人可以告诉我如何提取电子邮件的纯文本部分,同时忽略有时存在的辅助 HTML,同时将所有其他类型的文件附件保存为文件?如果这没有多大意义,我表示歉意。如果需要,我会更新问题并提供更多说明。

I am pretty new to python and I am trying to parse email from gmail via python's imaplib and email. It is working pretty well but I am having issues with email attachments.

I would like to parse out all of the plaintext from the email while ignoring any HTML that may be inserted as a secondary content type while also removing and saving all other attachments. I have been trying the following:

...imaplib connection and mailbox selection...

typ, msg_data = c.fetch(num, '(RFC822)')
        email_body = msg_data[0][1]
mail = email.message_from_string(email_body)
        for part in mail.walk():
            if part.get_content_type() == 'text/plain':
                body = body + '\n' + part.get_payload()
            else:
                continue

This was my original attempt to just take the plaintext portions of an email, but when someone sends an email with a text attachment, the contents of the text file shows up for the 'body' variable above.

Can someone tell me how I can extract the plaintext portions of an email while ignoring the secondary HTML that is sometimes present, while also saving all other types of file attachments as files? I appologize if this doesn't make a lot of sense. I will update the question with more clarification if needed.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夜空下最亮的亮点 2024-11-21 02:21:14

如果您只需要将文本附件保留在 body 变量之外,那么它应该像这样简单:

mail = email.message_from_string(email_body)
    for part in mail.walk():
        c_type = part.get_content_type()
        c_disp = part.get('Content-Disposition')

        if c_type == 'text/plain' and c_disp == None:
            body = body + '\n' + part.get_payload()
        else:
            continue

然后,如果 Content-Disposition 指示它是附件,您应该能够使用 part.get_filename()part.get_payload() 来处理文件。我不知道这些是否会有所不同,但这基本上是我过去用来与我的邮件服务器交互的方式。

If you just need to keep text attachments out of the body variable with what you have there, it should be as simple as this:

mail = email.message_from_string(email_body)
    for part in mail.walk():
        c_type = part.get_content_type()
        c_disp = part.get('Content-Disposition')

        if c_type == 'text/plain' and c_disp == None:
            body = body + '\n' + part.get_payload()
        else:
            continue

Then if the Content-Disposition indicates that it's an attachment, you should be able to use part.get_filename() and part.get_payload() to handle the file. I don't know if any of this can vary, but it's basically what I've used in the past to interface with my mail server.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文