为什么 HTML::Obliterate 没有删除我的 HTML?

发布于 2024-07-11 03:43:58 字数 1789 浏览 8 评论 0原文

我正在尝试使用以下代码,但我还无法测试它,因为我收到以下错误:

#!/usr/bin/perl
use warnings;
use strict;
use Text::Wrap;
use Mail::Box::Manager;
use HTML::Obliterate qw(extirpate_html);


open (MYFILE, '>>data.txt');
binmode(MYFILE, ':encoding(UTF-8)');


my $file = shift || $ENV{MAIL};
my $mgr = Mail::Box::Manager->new(
    access          => 'r',
);

my $folder = $mgr->open( folder => $file )
or die "$file: Unable to open: $!\n";

for my $msg ( sort { $a->timestamp <=> $b->timestamp } $folder->messages)
{
    my $to          = join( ', ', map { $_->format } $msg->to );
    my $from        = join( ', ', map { $_->format } $msg->from );
    my $date        = localtime( $msg->timestamp );
    my $subject     = $msg->subject;
    my $body        = $msg->decoded->string;


if ( $msg->isMultipart ) {
    foreach my $part ( $msg->parts ) {
        if ( $part->contentType eq 'text/html' ) {
          my $nohtml = extirpate_html( $msg );
$body =~ s/^>.*$//msg;
$Text::Wrap::columns=80;
print MYFILE wrap("", "", <<"");
\n
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body
        }

else {


$body =~ s/^>.*$//msg;
$Text::Wrap::columns=80;
print MYFILE wrap("", "", <<"");
\n
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body

}

}}

所有大括号似乎都匹配,所以我不确定问题是什么

syntax error at x.pl line 46, near "else"
  (Might be a runaway multi-line << string starting on line 36)
Missing right curly or square bracket at x.pl line 63, at end of line
syntax error at x.pl line 63, at EOF
Execution of x.pl aborted due to compilation errors.

编辑:

它现在可以工作,但是html 没有条纹:而是有一些电子邮件,其中包含诸如
> 之类的内容
> 整个内容交错,导致其页数比应有的多得多。 有一个更好的方法吗

I am trying to use the following code, which I have not been able to test yet, because I get the following errors:

#!/usr/bin/perl
use warnings;
use strict;
use Text::Wrap;
use Mail::Box::Manager;
use HTML::Obliterate qw(extirpate_html);


open (MYFILE, '>>data.txt');
binmode(MYFILE, ':encoding(UTF-8)');


my $file = shift || $ENV{MAIL};
my $mgr = Mail::Box::Manager->new(
    access          => 'r',
);

my $folder = $mgr->open( folder => $file )
or die "$file: Unable to open: $!\n";

for my $msg ( sort { $a->timestamp <=> $b->timestamp } $folder->messages)
{
    my $to          = join( ', ', map { $_->format } $msg->to );
    my $from        = join( ', ', map { $_->format } $msg->from );
    my $date        = localtime( $msg->timestamp );
    my $subject     = $msg->subject;
    my $body        = $msg->decoded->string;


if ( $msg->isMultipart ) {
    foreach my $part ( $msg->parts ) {
        if ( $part->contentType eq 'text/html' ) {
          my $nohtml = extirpate_html( $msg );
$body =~ s/^>.*$//msg;
$Text::Wrap::columns=80;
print MYFILE wrap("", "", <<"");
\n
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body
        }

else {


$body =~ s/^>.*$//msg;
$Text::Wrap::columns=80;
print MYFILE wrap("", "", <<"");
\n
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body

}

}}

All the braces seem to match up, so I am unsure what the problem is

syntax error at x.pl line 46, near "else"
  (Might be a runaway multi-line << string starting on line 36)
Missing right curly or square bracket at x.pl line 63, at end of line
syntax error at x.pl line 63, at EOF
Execution of x.pl aborted due to compilation errors.

edit:

it now works, but the html is not striped: instead a few emails with stuff like
>
> interlaced throughout, causing it to be many more pages than it should. Is there a better way to do this

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

屋檐 2024-07-18 03:43:58

所以第 36 行似乎意味着

print MYFILE wrap("", "", <<"");

perl 将包装以下文本,直到出现终止符 "" (我从不使用这样令人困惑的项目,为了简单起见,我总是使用 END 或 UNTIL_END。)

那么终止符就是在第 45 行(空行)上找到,这意味着它处理的下一件事情是第 46 行:

else {

这没有意义,因为前面的 if 尚未关闭(第 44 行有 } 是 终止符 "" 之前,因此 Perl 会注意到这一点,并善意地建议您这可能是罪魁祸首:

(Might be a runaway multi-line << string starting on line 36)

您需要将第 44 行和第 45 行交换到第一个。有终止符 "" (空行),然后用 } 关闭 if 。示例中的第二个换行可以正确执行此操作。

So line 36 seems to be

print MYFILE wrap("", "", <<"");

which means perl will wrap the following text until there is terminator "" (I never use confusing item like this, I always use END or UNTIL_END for simplicity.)

That terminator is then found on line 45 (the empty line), meaning next thing it processes is line 46:

else {

which doesn't make sense, since the previous if hasn't closed yet (the line 44 which has } is before the terminator "" so its treated as text for wrapping. Perl notices this and kindly suggest you this might be the culprit:

(Might be a runaway multi-line << string starting on line 36)

You need to swap lines 44 and 45 to first have terminator "" (empty line), then close the if with }. The second wrap in your example does this correctly.

同展鸳鸯锦 2024-07-18 03:43:58

回答您修改后的问题:

您不是消除消息正文,而是消除整个消息。 然后你就不会在任何地方使用它。

my $nohtml = extirpate_html( $msg );
$body =~ s/^>.*$//msg;
$Text::Wrap::columns=80;
print MYFILE wrap("", "", <<"");
\n
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body

也许您需要将其更改为:,

my $nohtml = extirpate_html( $body );
$nohtml =~ s/^>.*$//msg;

然后应用 $nohtml 作为换行的消息正文。

Answer to your modified question:

Instead of extirpating the message body, you extirpate the whole message instead. And then you don't use it anywhere.

my $nohtml = extirpate_html( $msg );
$body =~ s/^>.*$//msg;
$Text::Wrap::columns=80;
print MYFILE wrap("", "", <<"");
\n
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body

Perhaps you need to change it to:

my $nohtml = extirpate_html( $body );
$nohtml =~ s/^>.*$//msg;

and then apply the $nohtml as the message body for wrap.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文