如何通过 SSL 下载 IMAP 邮件附件并使用 Perl 将其保存在本地?

发布于 2024-08-25 03:21:19 字数 310 浏览 11 评论 0原文

我需要有关如何从主题行中包含附件和当前日期(即 YYYYMMDD 格式)的 IMAP 邮件下载附件并将附件保存到本地路径的建议。

我浏览了 Perl 模块 Mail::IMAPClient 并且能够连接到IMAP 邮件服务器,但在其他任务上需要帮助。另一件需要注意的事情是我的 IMAP 服务器需要 SSL 身份验证。

附件也可以是 gz、tar 或 tar.gz 文件。

I need suggestions on how can I download attachments from my IMAP mails which have attachments and current date in subject line i.e. YYYYMMDD format and save the attachments to a local path.

I went through the Perl module Mail::IMAPClient and am able to connect to the IMAP mail server, but need help on other tasks. One more thing to note is that my IMAP sever requires SSL auth.

Also the attachments could be gz, tar or tar.gz files.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

染墨丶若流云 2024-09-01 03:21:19

下面是一个可以完成您想要的操作的简单程序。

#! /usr/bin/perl

use warnings;
use strict;

Email::MIME 的最低版本是在引入 walk_parts 时使用的。

use Email::MIME 1.901;
use IO::Socket::SSL;
use Mail::IMAPClient;
use POSIX qw/ strftime /;
use Term::ReadKey;

您不想在程序中对密码进行硬编码,对吗?

sub read_password {
  local $| = 1;
  print "Enter password: ";

  ReadMode "noecho";
  my $password = <STDIN>;
  ReadMode "restore";

  die "$0: unexpected end of input"
    unless defined $password;

  print "\n";
  chomp $password; 
  $password;
}

使用 SSL 连接。我们应该能够通过构造函数的一个简单的 Ssl 参数来做到这一点,但一些供应商选择在他们的包中破坏它。

my $pw = read_password;
my $imap = Mail::IMAPClient->new(
 #Debug    => 1,
  User     => "you\@domain.com",
  Password => $pw,
  Uid      => 1,
  Peek     => 1,  # don't set \Seen flag
  Socket   => IO::Socket::SSL->new(
                Proto    => 'tcp',
                PeerAddr => 'imap.domain.com',
                PeerPort => 993,
              ),
);

die "$0: connect: $@" if defined $@;

如果您想要收件箱以外的文件夹,请更改它。

$imap->select("INBOX")
  or die "$0: select INBOX: ", $imap->LastError, "\n";

使用 IMAP 搜索,我们查找主题包含 YYYYMMDD 格式的今天日期的所有邮件。日期可以位于主题中的任何位置,因此,例如,主题“foo bar baz 20100316”将匹配今天。

my $today = strftime "%Y%m%d", localtime $^T;
my @messages = $imap->search(SUBJECT => $today);
die "$0: search: $@" if defined $@;

对于每条此类消息,将其附件写入当前目录中的文件。我们编写附件的最外层,而不挖掘嵌套附件。内容类型中带有 name 参数的部分(如 image/jpeg; name="foo.jpg")被假定为附件,我们忽略所有其他部分。已保存附件的名称由以下部分组成,以 - 分隔:今天的日期、其 IMAP 消息 ID、其在消息中的位置的基于 1 的索引及其名称。

foreach my $id (@messages) {
  die "$0: funky ID ($id)" unless $id =~ /\A\d+\z/;

  my $str = $imap->message_string($id)
    or die "$0: message_string: $@";

  my $n = 1;
  Email::MIME->new($str)->walk_parts(sub {
    my($part) = @_;
    return unless ($part->content_type =~ /\bname=([^"]+)/ 
                or $part->content_type =~ /\bname="([^"]+)"/); # " grr...

    my $name = "./$today-$id-" . $n++ . "-$1";
    print "$0: writing $name...\n";
    open my $fh, ">", $name
      or die "$0: open $name: $!";
    print $fh $part->content_type =~ m!^text/!
                ? $part->body_str
                : $part->body
      or die "$0: print $name: $!";
    close $fh
      or warn "$0: close $name: $!";
  });
}

A simple program that does what you want is below.

#! /usr/bin/perl

use warnings;
use strict;

The minimum version for Email::MIME is for when walk_parts was introduced.

use Email::MIME 1.901;
use IO::Socket::SSL;
use Mail::IMAPClient;
use POSIX qw/ strftime /;
use Term::ReadKey;

You don't want to hardcode your password in your program, do you?

sub read_password {
  local $| = 1;
  print "Enter password: ";

  ReadMode "noecho";
  my $password = <STDIN>;
  ReadMode "restore";

  die "$0: unexpected end of input"
    unless defined $password;

  print "\n";
  chomp $password; 
  $password;
}

Connect using SSL. We ought to be able to be able to do this with a simple Ssl parameter to the constructor, but some vendors have chosen to break it in their packages.

my $pw = read_password;
my $imap = Mail::IMAPClient->new(
 #Debug    => 1,
  User     => "you\@domain.com",
  Password => $pw,
  Uid      => 1,
  Peek     => 1,  # don't set \Seen flag
  Socket   => IO::Socket::SSL->new(
                Proto    => 'tcp',
                PeerAddr => 'imap.domain.com',
                PeerPort => 993,
              ),
);

die "$0: connect: $@" if defined $@;

If you want a folder other than the inbox, change it.

$imap->select("INBOX")
  or die "$0: select INBOX: ", $imap->LastError, "\n";

Using IMAP search, we look for all messages whose subjects contain today's date in YYYYMMDD format. The date can be anywhere in the subject, so, for example, a subject of "foo bar baz 20100316" would match today.

my $today = strftime "%Y%m%d", localtime $^T;
my @messages = $imap->search(SUBJECT => $today);
die "$0: search: $@" if defined $@;

For each such message, write its attachments to files in the current directory. We write the outermost layer of attachments and do not dig for nested attachments. A part with a name parameter in its content type (as in image/jpeg; name="foo.jpg") is assumed to be an attachment, and we ignore all other parts. A saved attachment's name is the following components separated by -: today's date, its IMAP message ID, a one-based index of its position in the message, and its name.

foreach my $id (@messages) {
  die "$0: funky ID ($id)" unless $id =~ /\A\d+\z/;

  my $str = $imap->message_string($id)
    or die "$0: message_string: $@";

  my $n = 1;
  Email::MIME->new($str)->walk_parts(sub {
    my($part) = @_;
    return unless ($part->content_type =~ /\bname=([^"]+)/ 
                or $part->content_type =~ /\bname="([^"]+)"/); # " grr...

    my $name = "./$today-$id-" . $n++ . "-$1";
    print "$0: writing $name...\n";
    open my $fh, ">", $name
      or die "$0: open $name: $!";
    print $fh $part->content_type =~ m!^text/!
                ? $part->body_str
                : $part->body
      or die "$0: print $name: $!";
    close $fh
      or warn "$0: close $name: $!";
  });
}
吝吻 2024-09-01 03:21:19

如果您想坚持使用 Mail::IMAPClient,您可以告诉它 < a href="http://search.cpan.org/perldoc/Mail::IMAPClient#Ssl" rel="nofollow noreferrer">使用 SSL。

或者,Net::IMAP::Simple::SSL 也可以帮助你。该接口与 Net::IMAP::Simple< 提供的接口相同/a>.

收到邮件后,解析带附件的电子邮件将显示如何提取附件。我还没有尝试过,但我的直觉是使用 Email::MIME:: walk_parts 可用于显着简化 PerlMonks 文章中显示的脚本。

If you want to stick with Mail::IMAPClient, you can tell it to use SSL.

Alternatively, Net::IMAP::Simple::SSL could also help you with that. The interface is the same as the one provided by Net::IMAP::Simple.

Once you have the message, Parsing emails with attachments shows how to extract attachments. I haven't tried it, but my hunch is that using Email::MIME::walk_parts can be used to significantly simplify the script shown in that PerlMonks article.

葬花如无物 2024-09-01 03:21:19

我稍微改变了从 @Greg 下载附件的方法,因为下载 SAP XML 附件被证明是不可靠的。它们不遵循 Content-Type: application/pdf; name=XXXXX 标准所以,给我带来了很多问题。示例:

Content-ID: <[email protected]>
Content-Disposition: attachment;
    filename="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.xml"
Content-Type: application/xml
Content-Descripton: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.xml

程序的其余部分几乎保持不变。不同之处在于,我现在使用 MIME::Parser 来检索所有消息,并且丢弃所有与正文和图像相关的内容。我还删除了 Peek =>; 1 因为我想在下载消息后将其标记为已读(并且仅在未读消息上导航)。 Log::Logger 帮助创建集中式日志:

--- Snippet 1 --- Libs

#! /usr/bin/perl
use warnings;
use strict;
use Mail::IMAPClient; #IMAP connection
use Log::Logger; #Logging facility
use MIME::Parser; #Mime "slicer"
use DateTime; #Date
use File::Copy; #File manipulation
use File::Path qw( mkpath );

--- Snippet 2 --- Log初始化 >

$log_script = new Log::Logger;
$log_script->open_append("/var/log/downloader.log");
my $dt = DateTime->now;
$dt->set_time_zone('America/Sao_Paulo');
$hour = (join ' ', $dt->ymd, $dt->hms);

--- 代码片段 3 --- 邮件下载器

$imap->select($remote_dir) or ($log_script->log("$hour: Account $account, Dir $remote_dir. Check if this folder exists") and next);
# Select unseen messages only
my @mails = ($imap->unseen);
foreach my $id (@mails) {
  my $subject = $imap->subject($id);
  my $str = $imap->message_string($id) or ($log_script->log("$hour: Account $account, Email \<$subject\> with problems. Crawling through next email") and next);
  my $parser = MIME::Parser->new();
  $parser->output_dir( $temp_dir );
  $parser->parse_data( $str );
  opendir(DIR, $temp_dir);
  foreach $file (readdir(DIR)) {
    next unless (-f "$temp_dir/$file");
    if ("$file" =~ /^msg/i){ # ignores body
      $body .= "$file ";
      unlink "$temp_dir/$file";
    } elsif (("$file" =~ /jpg$/i) # ignores signature images
          or ("$file" =~ /gif$/i)
          or ("$file" =~ /png$/i)) {
      $body .= "$file ";
      unlink "$temp_dir/$file";
    } else { # move attachments to destination dir
      $log_script->log("$hour: Account: $account, File $file, Email \<$subject\>, saved $local_dir");
      move "$temp_dir/$file", "$local_dir";
    };
 };
  $log_script->log("$hour: Files from email \<$subject\> ignored as they are body related stuff: $body") if $body;

I have changed a little my approach to download attachments from @Greg, since it was shown unreliable to download SAP XML attachments. They do not follow the Content-Type: application/pdf; name=XXXXX standard so, it gave me a lot of problems. Example:

Content-ID: <[email protected]>
Content-Disposition: attachment;
    filename="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.xml"
Content-Type: application/xml
Content-Descripton: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.xml

The rest of the program remains almost the same. The difference is that i´m now using MIME::Parser to retrieve all the message, and i throw away all that is body and image related. I also removed the Peek => 1 since i wanted to mark the messages as read after they got downloaded(and only navigate on unread messages). Log::Logger helped to create a centralized log:

--- Snippet 1 --- Libs

#! /usr/bin/perl
use warnings;
use strict;
use Mail::IMAPClient; #IMAP connection
use Log::Logger; #Logging facility
use MIME::Parser; #Mime "slicer"
use DateTime; #Date
use File::Copy; #File manipulation
use File::Path qw( mkpath );

--- Snippet 2 --- Log initialization

$log_script = new Log::Logger;
$log_script->open_append("/var/log/downloader.log");
my $dt = DateTime->now;
$dt->set_time_zone('America/Sao_Paulo');
$hour = (join ' ', $dt->ymd, $dt->hms);

--- Snippet 3 --- Mail downloader

$imap->select($remote_dir) or ($log_script->log("$hour: Account $account, Dir $remote_dir. Check if this folder exists") and next);
# Select unseen messages only
my @mails = ($imap->unseen);
foreach my $id (@mails) {
  my $subject = $imap->subject($id);
  my $str = $imap->message_string($id) or ($log_script->log("$hour: Account $account, Email \<$subject\> with problems. Crawling through next email") and next);
  my $parser = MIME::Parser->new();
  $parser->output_dir( $temp_dir );
  $parser->parse_data( $str );
  opendir(DIR, $temp_dir);
  foreach $file (readdir(DIR)) {
    next unless (-f "$temp_dir/$file");
    if ("$file" =~ /^msg/i){ # ignores body
      $body .= "$file ";
      unlink "$temp_dir/$file";
    } elsif (("$file" =~ /jpg$/i) # ignores signature images
          or ("$file" =~ /gif$/i)
          or ("$file" =~ /png$/i)) {
      $body .= "$file ";
      unlink "$temp_dir/$file";
    } else { # move attachments to destination dir
      $log_script->log("$hour: Account: $account, File $file, Email \<$subject\>, saved $local_dir");
      move "$temp_dir/$file", "$local_dir";
    };
 };
  $log_script->log("$hour: Files from email \<$subject\> ignored as they are body related stuff: $body") if $body;
云朵有点甜 2024-09-01 03:21:19

我更喜欢 Greg 概述的 Mail::IMAPClient 方法,但它对于binmode() 输出文件句柄,即防止 Windows 将 0x0A 字节假定为换行符并用 CRLF 替换它们,从而使二进制文件无效。
我很抱歉将此伪装为答案,评论是适当的,但我现在没有任何声誉。

I prefer the Mail::IMAPClient approach outlined by Greg, but it is essential to binmode() the output filehandle, namely to prevent Windows from assuming 0x0A bytes to be linefeeds and replacing them by CRLFs and so invalidating binary files.
I'm sorry to disguise this as an answer, comments would be appropriate, but I don't own any reputation by now.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文