如何使用 CGI.pm 在 mod_perl 中上传二进制文件?

发布于 2024-11-16 06:10:53 字数 1687 浏览 2 评论 0原文

我有一大段生产代码,可以工作。但是当我在虚拟机中设置了一个新环境后,我遇到了一个问题——每次我需要上传一个二进制文件时,它就会因为 unicode 转换而变得混乱。

所以有一个 sub,问题是:

sub save_uploaded_file
{
    # $file is obtained by param(zip) 
    my ($file) = @_;
    my ($fh, $fname) = tmpnam;
    my ($br, $buffer);
    # commenting out next 2 lines doesn't help either
    binmode $file, ':raw';
    binmode $fh, ':raw';
    while ($br = sysread($file, $buffer, 16384))
    {
        syswrite($fh, $buffer, $br);
    }
    close $fh;
    return $fname;
}

它用于上传 zip 档案,但它们上传时格式错误(它们的大小总是比原始大小大),我用十六进制编辑器查看它们内部,发现有很多 unicode 替换字符,以 utf-8 编码,内部 (EF BF BD)。

我发现读取的字节总数大于原始文件。所以问题是从sysread开始的。

文本文件上传良好。

更新: 传输的文件的前几个字节有一个二进制表示:

0000000: 504b 0304 1400 0000 0800 efbf bd1c efbf  PK..............
0000010: bd3e efbf bd1d 3aef bfbd efbf bd02 0000  .>....:.........
0000020: efbf bd05 0000 0500 1c00 422e 786d 6c55  ..........B.xmlU
0000030: 5409 0003 5cef bfbd efbf bd4d 18ef bfbd  T...\......M....
0000040: efbf bd4d 7578 0b00 0104 efbf bd03 0000  ...Mux..........
0000050: 0404 0000 00ef bfbd efbf bdef bfbd 6bef  ..............k.

和原始的:

0000000: 504b 0304 1400 0000 0800 b81c d33e df1d  PK...........>..
0000010: 3aa0 8102 0000 a405 0000 0500 1c00 422e  :.............B.
0000020: 786d 6c55 5409 0003 5cd4 fc4d 18c7 fc4d  xmlUT...\..M...M
0000030: 7578 0b00 0104 e803 0000 0404 0000 008d  ux..............
0000040: 94df 6bdb 3010 c7df 03f9 1f0e e1bd 254e  ..k.0.........%N
0000050: ec74 6c85 d825 2bac 9442 379a c25e ca8a  .tl..%+..B7..^..

Update2 运行软件为centos 5.6、perl 5.8.8、apache 2.2.3

I have a big piece of production code, that works. But after I setup a new environment in virtual machine I have one issue -- everytime I need to upload a binary file it become messed up with unicode conversions.

So there is a sub, where issue is:

sub save_uploaded_file
{
    # $file is obtained by param(zip) 
    my ($file) = @_;
    my ($fh, $fname) = tmpnam;
    my ($br, $buffer);
    # commenting out next 2 lines doesn't help either
    binmode $file, ':raw';
    binmode $fh, ':raw';
    while ($br = sysread($file, $buffer, 16384))
    {
        syswrite($fh, $buffer, $br);
    }
    close $fh;
    return $fname;
}

Its used to upload zip archives, but they are uploaded as malformed (their size is always bigger than in original) and I looked inside of them with hex editor and found that there are lots unicode replacement charaters, encoded in utf-8, inside (EF BF BD).

I figured out that the total sum of bytes read is bigger than original file. So the problem starts at sysread.

Text files uploads well.

Update:
There is a binary representation of first few bytes of transfered file:

0000000: 504b 0304 1400 0000 0800 efbf bd1c efbf  PK..............
0000010: bd3e efbf bd1d 3aef bfbd efbf bd02 0000  .>....:.........
0000020: efbf bd05 0000 0500 1c00 422e 786d 6c55  ..........B.xmlU
0000030: 5409 0003 5cef bfbd efbf bd4d 18ef bfbd  T...\......M....
0000040: efbf bd4d 7578 0b00 0104 efbf bd03 0000  ...Mux..........
0000050: 0404 0000 00ef bfbd efbf bdef bfbd 6bef  ..............k.

And the original one:

0000000: 504b 0304 1400 0000 0800 b81c d33e df1d  PK...........>..
0000010: 3aa0 8102 0000 a405 0000 0500 1c00 422e  :.............B.
0000020: 786d 6c55 5409 0003 5cd4 fc4d 18c7 fc4d  xmlUT...\..M...M
0000030: 7578 0b00 0104 e803 0000 0404 0000 008d  ux..............
0000040: 94df 6bdb 3010 c7df 03f9 1f0e e1bd 254e  ..k.0.........%N
0000050: ec74 6c85 d825 2bac 9442 379a c25e ca8a  .tl..%+..B7..^..

Update2
The running software is centos 5.6, perl 5.8.8, apache 2.2.3

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

迷爱 2024-11-23 06:10:53

tmpnam 是否返回标记为 utf8 的文件句柄?我认为不是!

尝试 binmode $fh, ":utf8" ;

Does tmpnam returns a filehandle marked as utf8? I think not!

try binmode $fh, ":utf8" ;

白芷 2024-11-23 06:10:53

sysread 正在以 utf8 格式读取文件,但该文件不是 utf8!前十个字节位于“基本拉丁范围”(00-7F)中,因此它们被解释为相同的字节。下一个字节“b8”不在有效范围内,它被“efbfbd”替换<=> \x{FFFD}(表示解码错误的特殊字符)。
所有大于 7F 的字节都将被替换为 \x{FFFD}。

您使用什么 perl 版本和操作系统?
有一个报告(perl bug 75106),标题为 binmode $fh, ":raw" does not undo :utf8 on win32

sysread is reading the file as utf8, but the file is not utf8! the first ten bytes are in "the basic latin range" (00-7F) so they are interpreted as the same byte. The next byte 'b8' is not in the valid range and its being replaced by 'efbfbd' <=> \x{FFFD} (a special char to indicate a decoding error).
All the bytes greater than 7F are being replaced by \x{FFFD}.

What perl version and OS are you using?
There is a report (perl bug 75106) with title binmode $fh, ":raw" doesn't undo :utf8 on win32!

千と千尋 2024-11-23 06:10:53

据我所知,Perl 5 不会在任何 io 层中交换替换字符。我所知道的唯一转换是换行符转换(即文本层)。您确定源文件不包含这些字节序列吗?

这段代码对我有用,对你有用吗?

#!/usr/bin/perl

use strict;
use warnings;

use File::Temp qw/:POSIX/;

sub save_uploaded_file {
    # $file is obtained by param(zip) 
    my ($file) = @_;
    my ($fh, $fname) = tmpnam;
    my ($br, $buffer);
    # commenting out next 2 lines doesn't help either
    binmode $file, ':raw'
        or die "could not change input file to raw: $!";
    binmode $fh, ':raw'
        or die "could not change tempfile to raw: $!";
    while ($br = sysread($file, $buffer, 16384)) {
        syswrite($fh, $buffer, $br);
    }
    close $fh
        or die "could not close tempfile: $!";
    return $fname;
}

sub check {
    my $input_file = shift;

    print "$input_file is ", -s $input_file, " bytes long\n"; 

    open my $fh, "<:raw", $input_file
        or die "could not open $input_file for reading: $!";

    my $bytes = sysread $fh, my $buf, 4096;

    print "read $bytes bytes: ", 
        join(", ", map { sprintf "%02x", $_ } unpack "C*", $buf),
        "\n";
}

my $input_file = "test.bin";

open my $fh, ">:raw", $input_file
    or die "could not open $input_file for writing: $!";

print $fh pack "CC", 0xFF, 0xFD
    or die "could not write to $input_file: $!";

close $fh
    or die "could not close $input_file: $!";

check $input_file;

open my $newfh, "<", $input_file
    or die "could not open $input_file: $!";
my $new_file = save_uploaded_file $newfh;

check $new_file;

As far as I know, Perl 5 doesn't swap in the replacement character in any of its io layers. They only conversions I am aware of are newline conversions (i.e. the text layer). Are you certain the source file does not contain those byte sequences?

This code works for me, does it work for you?

#!/usr/bin/perl

use strict;
use warnings;

use File::Temp qw/:POSIX/;

sub save_uploaded_file {
    # $file is obtained by param(zip) 
    my ($file) = @_;
    my ($fh, $fname) = tmpnam;
    my ($br, $buffer);
    # commenting out next 2 lines doesn't help either
    binmode $file, ':raw'
        or die "could not change input file to raw: $!";
    binmode $fh, ':raw'
        or die "could not change tempfile to raw: $!";
    while ($br = sysread($file, $buffer, 16384)) {
        syswrite($fh, $buffer, $br);
    }
    close $fh
        or die "could not close tempfile: $!";
    return $fname;
}

sub check {
    my $input_file = shift;

    print "$input_file is ", -s $input_file, " bytes long\n"; 

    open my $fh, "<:raw", $input_file
        or die "could not open $input_file for reading: $!";

    my $bytes = sysread $fh, my $buf, 4096;

    print "read $bytes bytes: ", 
        join(", ", map { sprintf "%02x", $_ } unpack "C*", $buf),
        "\n";
}

my $input_file = "test.bin";

open my $fh, ">:raw", $input_file
    or die "could not open $input_file for writing: $!";

print $fh pack "CC", 0xFF, 0xFD
    or die "could not write to $input_file: $!";

close $fh
    or die "could not close $input_file: $!";

check $input_file;

open my $newfh, "<", $input_file
    or die "could not open $input_file: $!";
my $new_file = save_uploaded_file $newfh;

check $new_file;
叫嚣ゝ 2024-11-23 06:10:53

我有我认为是同样的问题。该错误似乎很早就发生了,因为当客户端尝试加载二进制文件时,我的代码都没有执行。我通过在脚本顶部将 STDIN 设置为“raw”(二进制)来修复它...

binmode(STDIN, ':raw') ;

I had what I think is the same problem. The error seemed to be occurring very early, because none of my code ever executed when client attempted to load a binary file. I fixed it by setting STDIN to "raw" (binary), at the top of the script…

binmode(STDIN, ':raw') ;

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文