Perl 将虚假字符插入电子邮件中的大块文本中

发布于 2024-11-09 22:09:25 字数 2600 浏览 1 评论 0原文

我正在在线进行 Perl 维护(我对 Perl 没有了解)。表格中的详细信息将通过电子邮件发送给某人处理。测试人员在电子邮件中返回此错误:

大块文本插入了虚假字符。大约在 1961 年触发 每个文本字段或文本区域中的字符。感叹号和空格插入在 大约 1961 年,然后间隔 8 个字符,然后大约每 2048 个字符重复一次。

所以我尝试重复这个错误,这就是它返回的内容(字符 1960 - 1970):

一个!定义

我不知道什么会导致这种情况发生。我能想到“修复”它的唯一方法就是这样做:

if (length($someInput) gt 1500){
    $someInput=substr($someInput, 0, 1500);
}

有谁知道 Perl 中发生这种情况的原因是什么,以及我如何修复它?

编辑 这是我运行每个字段的函数。然后它就被放入电子邮件 html 中。

#sanitises and returns the given input
sub sanitiseInput {
    my ($input) = @_;
    $input = trim(param($input));
    $input = HTML::Entities::decode($input);
    $input =~ s/<script[^>]*?>.*?<\/script>//gi; # strip out javascript
    $input =~ s/<style[^>]*?>.*?<\/style>//gi;   # strip out styles
    $input =~ s/<![\s\S]*?--[ \t\n\r]*>//gi;     # strip out multi-line comments
    $input =~ s/&/&amp;/gi;                      # & to &amp;
    $input =~ s/</&lt;/gi;                       # < to &lt;
    $input =~ s/>/&gt;/gi;                       # > to &gt;
    $input =~ s/"/&#34;/gi;                      # " to &#34;
    $input =~ s/'/&#39;/gi;                      # ' to &#39;
    $input =~ s/\r\n/<br>/gi;                    # return and newline to <br>
    $input =~ s/\r/<br>/gi;                      # return to <br>
    $input =~ s/\n/<br>/gi;                      # newline to <br>
    return $input;                               #return the new value
}

编辑 这是通过电子邮件发送 html 的功能

sub mailer {
    my ($from_eddress, $to_eddress, $subject, $mail_content, $fail_eddress)=@_;
    open(MAIL, "|/usr/sbin/sendmail -f $from_eddress $to_eddress") or print "Cannot fork to mail - $!\n";
    print MAIL "From: $from_eddress\n";
    print MAIL "To: $to_eddress \n";
    print MAIL "Subject: $subject\n";
    if ($fail_eddress != '') { print MAIL "fail-to: $fail_eddress\n"; }
    print MAIL "Content-type: text/html\n\n";
    print MAIL "\n";
    print MAIL "<html><head><style>body, p, th, td {font-size: 0.75em; font-family:  Arial, Helvetica, sans-serif;} a {font-size: 1em; font-family:  Arial, Helvetica, sans-serif;} .large{font-size: 1.2em;} .small{font-size: .8em;} </style></head><body>";
    print MAIL "$mail_content";
    print MAIL "</body></html>";
    close (MAIL);
}

I am doing maintenence on an online for done in Perl (i have no knowledge in Perl). The details in the form get emailed for someone to handle. The tester came back with this error in the email:

Large blocks of text have spurious characters inserted. Triggered at approximately 1961
characters in each text field or text area. An exclamation mark and space are inserted at
approx 1961 then a space eight characters later then repeating approximately every 2048 characters.

So I tried to repeat this error, and this is what it returned (characters 1960 - 1970):

a! defghij

I have no idea what would cause this to occur. The only way I can think to "fix" it would be to do this:

if (length($someInput) gt 1500){
    $someInput=substr($someInput, 0, 1500);
}

Does anyone know what causes this to happen in Perl, and how i can fix it?

EDIT
This is the function that i run every field through. Then it just gets put into the email html.

#sanitises and returns the given input
sub sanitiseInput {
    my ($input) = @_;
    $input = trim(param($input));
    $input = HTML::Entities::decode($input);
    $input =~ s/<script[^>]*?>.*?<\/script>//gi; # strip out javascript
    $input =~ s/<style[^>]*?>.*?<\/style>//gi;   # strip out styles
    $input =~ s/<![\s\S]*?--[ \t\n\r]*>//gi;     # strip out multi-line comments
    $input =~ s/&/&/gi;                      # & to &
    $input =~ s/</</gi;                       # < to <
    $input =~ s/>/>/gi;                       # > to >
    $input =~ s/"/"/gi;                      # " to "
    $input =~ s/'/'/gi;                      # ' to '
    $input =~ s/\r\n/<br>/gi;                    # return and newline to <br>
    $input =~ s/\r/<br>/gi;                      # return to <br>
    $input =~ s/\n/<br>/gi;                      # newline to <br>
    return $input;                               #return the new value
}

EDIT This is the function the emails the html

sub mailer {
    my ($from_eddress, $to_eddress, $subject, $mail_content, $fail_eddress)=@_;
    open(MAIL, "|/usr/sbin/sendmail -f $from_eddress $to_eddress") or print "Cannot fork to mail - $!\n";
    print MAIL "From: $from_eddress\n";
    print MAIL "To: $to_eddress \n";
    print MAIL "Subject: $subject\n";
    if ($fail_eddress != '') { print MAIL "fail-to: $fail_eddress\n"; }
    print MAIL "Content-type: text/html\n\n";
    print MAIL "\n";
    print MAIL "<html><head><style>body, p, th, td {font-size: 0.75em; font-family:  Arial, Helvetica, sans-serif;} a {font-size: 1em; font-family:  Arial, Helvetica, sans-serif;} .large{font-size: 1.2em;} .small{font-size: .8em;} </style></head><body>";
    print MAIL "$mail_content";
    print MAIL "</body></html>";
    close (MAIL);
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

寻找我们的幸福 2024-11-16 22:09:25

我不认为它发生在你的 Perl 程序中。

我以前在尝试通过管道将邮件发送到 sendmail 时见过这种情况。邮件规范中有行长度限制,但我实际上怀疑 sendmail 使用的是 2048 字节的输入缓冲区。

要点是,在将其通过管道传送到 sendmail 之前,您要从输入中删除所有换行符(将它们转换为
)。不。也许将其添加为最后一个替换:(

    $input =~ s/<br>/<br>\r\n/gi;                      # break up the single line

事实上,您与第一个 ! 的距离比与下一个 ! 的距离更短,这让我怀疑 sendmail 将前导 HTML 位计算为同一行的一部分 - RFC 822 格式要求 \r\n 换行符)。

另外,如果用户可能会输入 3kb 的无换行符,您可能需要在空格上换行,而不是在原始换行符上换行。

编辑:刚刚注意到我的正则表达式去掉了
——脑屁。现在好多了?

I don't think it happens inside your Perl program.

I have seen this before when trying to mail stuff by piping it to sendmail. There's a line length limit in the mail specs, but I actually suspect that sendmail is using a 2048-byte input buffer.

Point is, you are removing all the linebreaks from the input (converting them to <br>) before you pipe it to sendmail. Don't. Maybe add this as the last substitution:

    $input =~ s/<br>/<br>\r\n/gi;                      # break up the single line

(The fact that you get a lower distance to the first ! than to the next makes me suspect that sendmail counts the leading HTML bits as part of the same line -- the RFC 822 format calls for \r\n line breaks).

Also, if a user is likely to input a 3kb rant without linebreaks, you might want to break lines on whitespace instead on original linebreaks.

EDIT: Just noticed that my regexp took away the <br> -- brain fart. Better now?

笑咖 2024-11-16 22:09:25

Perl 似乎不太可能(不可能)随机地这样做。代码中或者向 Perl 提供数据的任何进程中一定存在错误。

我们没有足够的信息来进一步推测。

It seems unlikely (impossible) that perl is randomly doing this. There must either be a bug in the code or in whatever process is feeding the data to perl.

We don't have enough information to speculate further.

蓝眼睛不忧郁 2024-11-16 22:09:25

乍一看,这些正则表达式在我看来好像它们可能会失去第一个 ? 另外,我列出的最后一个正则表达式

$input =~ s/<script[^>]*?>.*?<\/script>//gi; # strip out javascript
$input =~ s/<style[^>]*?>.*?<\/style>//gi;   # strip out styles
$input =~ s/<![\s\S]*?--[ \t\n\r]*>//gi;     # strip out multi-line comments

可能有问题

$input =~ s/<![\s\S]*?--[ \t\n\r]*>//gi;     # strip out multi-line comments

[\s\S] 中的 \S 可以匹配多行注释的末尾,因为 \S 将匹配 [->] 字符。

At first glance these regexes look to me like they could lose the first ? character in:

$input =~ s/<script[^>]*?>.*?<\/script>//gi; # strip out javascript
$input =~ s/<style[^>]*?>.*?<\/style>//gi;   # strip out styles
$input =~ s/<![\s\S]*?--[ \t\n\r]*>//gi;     # strip out multi-line comments

Also, that last regex I listed could be problematic

$input =~ s/<![\s\S]*?--[ \t\n\r]*>//gi;     # strip out multi-line comments

The \S in [\s\S] could match past the end of a multi-line comment because \S would match on [->] characters.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文