从 EML 格式解析电子邮件附件的正确 PHP 方法

发布于 2024-10-14 17:39:38 字数 2626 浏览 6 评论 0原文

我有一个文件,其中包含“纯文本 MIME 消息格式”的电子邮件。我不确定这是否是 EML 格式。该电子邮件包含一个附件,我想提取该附件并再次创建这些文件。这就是附件部分的样子 -

...
...
Receive, deliver details
...
...
From: sac ascsac <[email protected]>

Date: Thu, 20 Jan 2011 18:05:16 +0530

Message-ID: <[email protected]>

Subject: Test attachments

To: [email protected]

Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12



--20cf3054ac85d97721049a465e12

Content-Type: multipart/alternative; boundary=20cf3054ac85d97717049a465e10



--20cf3054ac85d97717049a465e10

Content-Type: text/plain; charset=ISO-8859-1



hello this is a test mail. It contains two attachments



--20cf3054ac85d97717049a465e10

Content-Type: text/html; charset=ISO-8859-1



hello this is a test mail. It contains two attachments<br>


--20cf3054ac85d97717049a465e10--

--20cf3054ac85d97721049a465e12

Content-Type: text/plain; charset=US-ASCII; name="simple_test.txt"

Content-Disposition: attachment; filename="simple_test.txt"

Content-Transfer-Encoding: base64

X-Attachment-Id: f_gj5n2yx60



aGVsbG8gd29ybGQKYWMgYXNj
...
encoded things here
...
ZyBmZyAKCjIKNDIzCnQ2Mwo=

--20cf3054ac85d97721049a465e12

Content-Type: application/x-httpd-php; name="oscomm_backup_code.php"

Content-Disposition: attachment; filename="oscomm_backup_code.php"

Content-Transfer-Encoding: base64

X-Attachment-Id: f_gj5n5gxn1



PD9waHAKCg ...
...
encoded things here
...
X2xpbmsoRklMRU5BTUVfQkFDS1VQKSk7Cgo/Pgo=
--20cf3054ac85d97721049a465e12--

我可以看到 X-Attachment-Id: f_gj5n2yx60ZyBmZyAKCjIKNDIzCnQ2Mwo= 之间的部分,两者都包括 是第一个附件的内容。我想解析这些附件(文件名和内容并创建这些文件)。

我使用 DBX Parser 类在 PHP 类中可用。

我在很多地方进行了搜索,除了 解析电子邮件附件的脚本。可能是我在搜索时错过了一些术语。那个答案中提到——

您可以使用边界来提取 Base64编码信息

但我不确定哪些是边界以及如何使用边界?必须已经有一些库或一些明确定义的方法来执行此操作。我想如果我尝试在这里重新发明轮子,我会犯很多错误。

I have a file containing an email in "plain text MIME message format". I am not sure if this is the EML format. The email contains an attachment and I want to extract the attachment and create those files again. This is how the attachment part looks like -

...
...
Receive, deliver details
...
...
From: sac ascsac <[email protected]>

Date: Thu, 20 Jan 2011 18:05:16 +0530

Message-ID: <[email protected]>

Subject: Test attachments

To: [email protected]

Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12



--20cf3054ac85d97721049a465e12

Content-Type: multipart/alternative; boundary=20cf3054ac85d97717049a465e10



--20cf3054ac85d97717049a465e10

Content-Type: text/plain; charset=ISO-8859-1



hello this is a test mail. It contains two attachments



--20cf3054ac85d97717049a465e10

Content-Type: text/html; charset=ISO-8859-1



hello this is a test mail. It contains two attachments<br>


--20cf3054ac85d97717049a465e10--

--20cf3054ac85d97721049a465e12

Content-Type: text/plain; charset=US-ASCII; name="simple_test.txt"

Content-Disposition: attachment; filename="simple_test.txt"

Content-Transfer-Encoding: base64

X-Attachment-Id: f_gj5n2yx60



aGVsbG8gd29ybGQKYWMgYXNj
...
encoded things here
...
ZyBmZyAKCjIKNDIzCnQ2Mwo=

--20cf3054ac85d97721049a465e12

Content-Type: application/x-httpd-php; name="oscomm_backup_code.php"

Content-Disposition: attachment; filename="oscomm_backup_code.php"

Content-Transfer-Encoding: base64

X-Attachment-Id: f_gj5n5gxn1



PD9waHAKCg ...
...
encoded things here
...
X2xpbmsoRklMRU5BTUVfQkFDS1VQKSk7Cgo/Pgo=
--20cf3054ac85d97721049a465e12--

I can see that the part between X-Attachment-Id: f_gj5n2yx60 and ZyBmZyAKCjIKNDIzCnQ2Mwo=, both including
is the content of the first attachment. I want to parse those attachments (file names and contents and create those files).

I got this file after parsing a dbx format file using a DBX Parser class available in PHP classes.

I searched in many places and did not find much discussion regarding this here in SO other than Script to parse emails for attachments. May be I missed some terms while searching. In that answer it is mentioned -

you can use the boundries to extract
the base64 encoded information

But I am not sure which are the boundaries and how exactly to use the boundaries? There already must be some libraries or some well defined method of doing this. I guess I will commit many mistakes if I try reinventing the wheel here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

吻安 2024-10-21 17:39:38

有一个 PHP Mailparse 扩展,你尝试过吗?

手动方式是逐行处理邮件。当您点击第一个 Content-Type 标头(示例中的这个标头)时:
内容类型:多部分/混合;边界=20cf3054ac85d97721049a465e12

你有边界。该字符串用作多个部分之间的边界(这就是他们称之为多部分的原因)。
每当一行以破折号和该字符串开头时,就会开始一个新部分。在你的例子中:
--20cf3054ac85d97721049a465e12

每个部分都以标题、空行和内容开头。通过查看标题的内容类型,您可以确定哪些是附件、它们的类型是什么以及它们的文件名。
读取整个内容,去掉空格,对其进行 base64 解码,您就得到了文件的二进制内容。这有帮助吗?

There's an PHP Mailparse extension, have you tried it?

The manual way would be, process the mail line by line. When you hit your first Content-Type header (this one in your example):
Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12

You have the boundary. This string is used as the boundary between your multiple parts (that's why they call it multipart).
Everytime a line starts with the dashes and this string, a new part begin. In your example:
--20cf3054ac85d97721049a465e12

Every part will start with headers, a blank line, and content. By looking at the content-type of the headers you can determine which are attachments, what their type is and their filename.
Read the whole content, strip the spaces, base64_decode it, and you've got the binary contents of the file. Does this help?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文