从 EML 格式解析电子邮件附件的正确 PHP 方法
我有一个文件,其中包含“纯文本 MIME 消息格式”的电子邮件。我不确定这是否是 EML 格式。该电子邮件包含一个附件,我想提取该附件并再次创建这些文件。这就是附件部分的样子 -
...
...
Receive, deliver details
...
...
From: sac ascsac <[email protected]>
Date: Thu, 20 Jan 2011 18:05:16 +0530
Message-ID: <[email protected]>
Subject: Test attachments
To: [email protected]
Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12
--20cf3054ac85d97721049a465e12
Content-Type: multipart/alternative; boundary=20cf3054ac85d97717049a465e10
--20cf3054ac85d97717049a465e10
Content-Type: text/plain; charset=ISO-8859-1
hello this is a test mail. It contains two attachments
--20cf3054ac85d97717049a465e10
Content-Type: text/html; charset=ISO-8859-1
hello this is a test mail. It contains two attachments<br>
--20cf3054ac85d97717049a465e10--
--20cf3054ac85d97721049a465e12
Content-Type: text/plain; charset=US-ASCII; name="simple_test.txt"
Content-Disposition: attachment; filename="simple_test.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gj5n2yx60
aGVsbG8gd29ybGQKYWMgYXNj
...
encoded things here
...
ZyBmZyAKCjIKNDIzCnQ2Mwo=
--20cf3054ac85d97721049a465e12
Content-Type: application/x-httpd-php; name="oscomm_backup_code.php"
Content-Disposition: attachment; filename="oscomm_backup_code.php"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gj5n5gxn1
PD9waHAKCg ...
...
encoded things here
...
X2xpbmsoRklMRU5BTUVfQkFDS1VQKSk7Cgo/Pgo=
--20cf3054ac85d97721049a465e12--
我可以看到 X-Attachment-Id: f_gj5n2yx60
和 ZyBmZyAKCjIKNDIzCnQ2Mwo=
之间的部分,两者都包括 是第一个附件的内容。我想解析这些附件(文件名和内容并创建这些文件)。
我使用 DBX Parser 类在 PHP 类中可用。
我在很多地方进行了搜索,除了 解析电子邮件附件的脚本。可能是我在搜索时错过了一些术语。那个答案中提到——
您可以使用边界来提取 Base64编码信息
但我不确定哪些是边界以及如何使用边界?必须已经有一些库或一些明确定义的方法来执行此操作。我想如果我尝试在这里重新发明轮子,我会犯很多错误。
I have a file containing an email in "plain text MIME message format". I am not sure if this is the EML format. The email contains an attachment and I want to extract the attachment and create those files again. This is how the attachment part looks like -
...
...
Receive, deliver details
...
...
From: sac ascsac <[email protected]>
Date: Thu, 20 Jan 2011 18:05:16 +0530
Message-ID: <[email protected]>
Subject: Test attachments
To: [email protected]
Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12
--20cf3054ac85d97721049a465e12
Content-Type: multipart/alternative; boundary=20cf3054ac85d97717049a465e10
--20cf3054ac85d97717049a465e10
Content-Type: text/plain; charset=ISO-8859-1
hello this is a test mail. It contains two attachments
--20cf3054ac85d97717049a465e10
Content-Type: text/html; charset=ISO-8859-1
hello this is a test mail. It contains two attachments<br>
--20cf3054ac85d97717049a465e10--
--20cf3054ac85d97721049a465e12
Content-Type: text/plain; charset=US-ASCII; name="simple_test.txt"
Content-Disposition: attachment; filename="simple_test.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gj5n2yx60
aGVsbG8gd29ybGQKYWMgYXNj
...
encoded things here
...
ZyBmZyAKCjIKNDIzCnQ2Mwo=
--20cf3054ac85d97721049a465e12
Content-Type: application/x-httpd-php; name="oscomm_backup_code.php"
Content-Disposition: attachment; filename="oscomm_backup_code.php"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gj5n5gxn1
PD9waHAKCg ...
...
encoded things here
...
X2xpbmsoRklMRU5BTUVfQkFDS1VQKSk7Cgo/Pgo=
--20cf3054ac85d97721049a465e12--
I can see that the part between X-Attachment-Id: f_gj5n2yx60
and ZyBmZyAKCjIKNDIzCnQ2Mwo=
, both including
is the content of the first attachment. I want to parse those attachments (file names and contents and create those files).
I got this file after parsing a dbx format file using a DBX Parser class available in PHP classes.
I searched in many places and did not find much discussion regarding this here in SO other than Script to parse emails for attachments. May be I missed some terms while searching. In that answer it is mentioned -
you can use the boundries to extract
the base64 encoded information
But I am not sure which are the boundaries and how exactly to use the boundaries? There already must be some libraries or some well defined method of doing this. I guess I will commit many mistakes if I try reinventing the wheel here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
有一个 PHP Mailparse 扩展,你尝试过吗?
手动方式是逐行处理邮件。当您点击第一个 Content-Type 标头(示例中的这个标头)时:
内容类型:多部分/混合;边界=20cf3054ac85d97721049a465e12
你有边界。该字符串用作多个部分之间的边界(这就是他们称之为多部分的原因)。
每当一行以破折号和该字符串开头时,就会开始一个新部分。在你的例子中:
--20cf3054ac85d97721049a465e12
每个部分都以标题、空行和内容开头。通过查看标题的内容类型,您可以确定哪些是附件、它们的类型是什么以及它们的文件名。
读取整个内容,去掉空格,对其进行 base64 解码,您就得到了文件的二进制内容。这有帮助吗?
There's an PHP Mailparse extension, have you tried it?
The manual way would be, process the mail line by line. When you hit your first Content-Type header (this one in your example):
Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12
You have the boundary. This string is used as the boundary between your multiple parts (that's why they call it multipart).
Everytime a line starts with the dashes and this string, a new part begin. In your example:
--20cf3054ac85d97721049a465e12
Every part will start with headers, a blank line, and content. By looking at the content-type of the headers you can determine which are attachments, what their type is and their filename.
Read the whole content, strip the spaces, base64_decode it, and you've got the binary contents of the file. Does this help?