为什么 HTML::Obliterate 没有删除我的 HTML?
我正在尝试使用以下代码,但我还无法测试它,因为我收到以下错误:
#!/usr/bin/perl
use warnings;
use strict;
use Text::Wrap;
use Mail::Box::Manager;
use HTML::Obliterate qw(extirpate_html);
open (MYFILE, '>>data.txt');
binmode(MYFILE, ':encoding(UTF-8)');
my $file = shift || $ENV{MAIL};
my $mgr = Mail::Box::Manager->new(
access => 'r',
);
my $folder = $mgr->open( folder => $file )
or die "$file: Unable to open: $!\n";
for my $msg ( sort { $a->timestamp <=> $b->timestamp } $folder->messages)
{
my $to = join( ', ', map { $_->format } $msg->to );
my $from = join( ', ', map { $_->format } $msg->from );
my $date = localtime( $msg->timestamp );
my $subject = $msg->subject;
my $body = $msg->decoded->string;
if ( $msg->isMultipart ) {
foreach my $part ( $msg->parts ) {
if ( $part->contentType eq 'text/html' ) {
my $nohtml = extirpate_html( $msg );
$body =~ s/^>.*$//msg;
$Text::Wrap::columns=80;
print MYFILE wrap("", "", <<"");
\n
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body
}
else {
$body =~ s/^>.*$//msg;
$Text::Wrap::columns=80;
print MYFILE wrap("", "", <<"");
\n
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body
}
}}
所有大括号似乎都匹配,所以我不确定问题是什么
syntax error at x.pl line 46, near "else"
(Might be a runaway multi-line << string starting on line 36)
Missing right curly or square bracket at x.pl line 63, at end of line
syntax error at x.pl line 63, at EOF
Execution of x.pl aborted due to compilation errors.
编辑:
它现在可以工作,但是html 没有条纹:而是有一些电子邮件,其中包含诸如
> 之类的内容
> 整个内容交错,导致其页数比应有的多得多。 有一个更好的方法吗
I am trying to use the following code, which I have not been able to test yet, because I get the following errors:
#!/usr/bin/perl
use warnings;
use strict;
use Text::Wrap;
use Mail::Box::Manager;
use HTML::Obliterate qw(extirpate_html);
open (MYFILE, '>>data.txt');
binmode(MYFILE, ':encoding(UTF-8)');
my $file = shift || $ENV{MAIL};
my $mgr = Mail::Box::Manager->new(
access => 'r',
);
my $folder = $mgr->open( folder => $file )
or die "$file: Unable to open: $!\n";
for my $msg ( sort { $a->timestamp <=> $b->timestamp } $folder->messages)
{
my $to = join( ', ', map { $_->format } $msg->to );
my $from = join( ', ', map { $_->format } $msg->from );
my $date = localtime( $msg->timestamp );
my $subject = $msg->subject;
my $body = $msg->decoded->string;
if ( $msg->isMultipart ) {
foreach my $part ( $msg->parts ) {
if ( $part->contentType eq 'text/html' ) {
my $nohtml = extirpate_html( $msg );
$body =~ s/^>.*$//msg;
$Text::Wrap::columns=80;
print MYFILE wrap("", "", <<"");
\n
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body
}
else {
$body =~ s/^>.*$//msg;
$Text::Wrap::columns=80;
print MYFILE wrap("", "", <<"");
\n
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body
}
}}
All the braces seem to match up, so I am unsure what the problem is
syntax error at x.pl line 46, near "else"
(Might be a runaway multi-line << string starting on line 36)
Missing right curly or square bracket at x.pl line 63, at end of line
syntax error at x.pl line 63, at EOF
Execution of x.pl aborted due to compilation errors.
edit:
it now works, but the html is not striped: instead a few emails with stuff like
>
> interlaced throughout, causing it to be many more pages than it should. Is there a better way to do this
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
所以第 36 行似乎意味着
perl 将包装以下文本,直到出现终止符
""
(我从不使用这样令人困惑的项目,为了简单起见,我总是使用 END 或 UNTIL_END。)那么终止符就是在第 45 行(空行)上找到,这意味着它处理的下一件事情是第 46 行:
这没有意义,因为前面的
if
尚未关闭(第 44 行有 } 是 终止符""
之前,因此 Perl 会注意到这一点,并善意地建议您这可能是罪魁祸首:您需要将第 44 行和第 45 行交换到第一个。有终止符
""
(空行),然后用}
关闭 if 。示例中的第二个换行可以正确执行此操作。So line 36 seems to be
which means perl will wrap the following text until there is terminator
""
(I never use confusing item like this, I always use END or UNTIL_END for simplicity.)That terminator is then found on line 45 (the empty line), meaning next thing it processes is line 46:
which doesn't make sense, since the previous
if
hasn't closed yet (the line 44 which has } is before the terminator""
so its treated as text for wrapping. Perl notices this and kindly suggest you this might be the culprit:You need to swap lines 44 and 45 to first have terminator
""
(empty line), then close the if with}
. The second wrap in your example does this correctly.回答您修改后的问题:
您不是消除消息正文,而是消除整个消息。 然后你就不会在任何地方使用它。
也许您需要将其更改为:,
然后应用
$nohtml
作为换行的消息正文。Answer to your modified question:
Instead of extirpating the message body, you extirpate the whole message instead. And then you don't use it anywhere.
Perhaps you need to change it to:
and then apply the
$nohtml
as the message body for wrap.