电子邮件正文有时是字符串,有时是列表。 为什么?

发布于 2024-07-14 19:39:27 字数 667 浏览 13 评论 0原文

我的应用程序是用 python 编写的。 我正在做的是对 postfix 收到的每封电子邮件运行一个脚本,并对电子邮件内容执行一些操作。 Procmail 负责运行以电子邮件作为输入的脚本。 当我将输入消息(可能是文本)转换为 email_message 对象时(因为后者派上用场),问题就开始了。 我正在使用 email.message_from_string (其中 email 是默认的电子邮件模块,随 python 一起提供)。

import email message = email.message_from_string(original_mail_content) message_body = message.get_payload()

此 message_body 有时返回一个列表[email.message.Message 实例,email.message.Message 实例],有时返回一个字符串(传入电子邮件的实际正文内容)。 为什么。 甚至我还发现了一项观察结果。 当我浏览 email.message.Message.get_payload() 文档字符串时,我发现了这个..
”“” 有效负载将是一个列表对象或一个字符串。如果你改变 列表对象,您就地修改消息的有效负载......"""

那么我如何使用通用方法通过 python 获取电子邮件正文?请帮助我。

My application is written in python. What I am doing is I am running a script on each email received by postfix and do something with the email content. Procmail is responsible for running the script taking the email as input. The problem started when I was converting the input message(may be text) to email_message object(because the latter comes in handy). I am using email.message_from_string (where email is the default email module, comes with python).


import email
message = email.message_from_string(original_mail_content)
message_body = message.get_payload()

This message_body is sometimes returning a list[email.message.Message instance,email.message.Message instance] and sometime returning a string(actual body content of the incoming email). Why is it. And even I found one more observation. When I was browsing through the email.message.Message.get_payload() docstring, I found this..
"""
The payload will either be a list object or a string.If you mutate
the list object, you modify the message's payload in place....."""

So how do I have generic method to get the body of email through python? Please help me out.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

岁月蹉跎了容颜 2024-07-21 19:39:27

好吧,答案是正确的,您应该阅读文档,但是对于通用方法的示例:

def get_first_text_part(msg):
    maintype = msg.get_content_maintype()
    if maintype == 'multipart':
        for part in msg.get_payload():
            if part.get_content_maintype() == 'text':
                return part.get_payload()
    elif maintype == 'text':
        return msg.get_payload()

这很容易发生一些灾难,因为可以想象部件本身可能有多个部分,并且它实际上只返回第一个文本部分,所以这也可能是错误的,但你可以尝试一下。

Well, the answers are correct, you should read the docs, but for an example of a generic way:

def get_first_text_part(msg):
    maintype = msg.get_content_maintype()
    if maintype == 'multipart':
        for part in msg.get_payload():
            if part.get_content_maintype() == 'text':
                return part.get_payload()
    elif maintype == 'text':
        return msg.get_payload()

This is prone to some disaster, as it is conceivable the parts themselves might have multiparts, and it really only returns the first text part, so this might be wrong too, but you can play with it.

终弃我 2024-07-21 19:39:27

尽管看起来很疯狂,但有时是字符串,有时是列表语义的原因是

As crazy as it might seem, the reason for the sometimes string, sometimes list-semantics are given in the documentation. Basically, multipart messages are returned as lists.

叹梦 2024-07-21 19:39:27

不是简单地寻找子部分,而是使用 walk() 来迭代消息内容

def walkMsg(msg):
  for part in msg.walk():
    if part.get_content_type() == "multipart/alternative":
      continue
    yield part.get_payload(decode=1)

walk() 方法返回一个可以循环使用的迭代器(即它是一个生成器)。 如果消息不是部件的容器(即没有附件或替代项),则 walk() 方法将返回一个带有单个元素(消息本身)的迭代器。

您想跳过任何“多部分”部件,因为它们只是胶水。

上述方法返回所有可读部分。 您可能希望扩展它以简单地返回文本部分(如果它们包含您正在查找的信息)。

请注意,从 Python 2.5 开始,方法 get_type()、get_main_type() 和 get_subtype() 已被删除 -> http://docs.python.org/library/ email.message.html#email.message.Message.walk

Rather than simply looking for a sub-part, use walk() to iterate through the message contents

def walkMsg(msg):
  for part in msg.walk():
    if part.get_content_type() == "multipart/alternative":
      continue
    yield part.get_payload(decode=1)

The walk() method returns an iterator that you can loop with (i.e. it's a generator). If the message is not a container of parts (i.e. has no attachments or alternates), the walk() method will then return an iterator with a single element - the message itself.

You want to skip any 'multipart' parts as they are just glue.

The above method returns all readable parts. You may want to expand this to simply return the text parts if they contain the info you are seeking.

Note that as of Python 2.5, methods get_type(), get_main_type(), and get_subtype() have been removed -> http://docs.python.org/library/email.message.html#email.message.Message.walk

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文