如何使用 CGI.pm 在 mod_perl 中上传二进制文件？

发布于 2024-11-16 06:10:53 字数 1687 浏览 2 评论 0原文

我有一大段生产代码，可以工作。但是当我在虚拟机中设置了一个新环境后，我遇到了一个问题——每次我需要上传一个二进制文件时，它就会因为 unicode 转换而变得混乱。

所以有一个 sub，问题是：

sub save_uploaded_file
{
    # $file is obtained by param(zip) 
    my ($file) = @_;
    my ($fh, $fname) = tmpnam;
    my ($br, $buffer);
    # commenting out next 2 lines doesn't help either
    binmode $file, ':raw';
    binmode $fh, ':raw';
    while ($br = sysread($file, $buffer, 16384))
    {
        syswrite($fh, $buffer, $br);
    }
    close $fh;
    return $fname;
}

它用于上传 zip 档案，但它们上传时格式错误（它们的大小总是比原始大小大），我用十六进制编辑器查看它们内部，发现有很多 unicode 替换字符，以 utf-8 编码，内部 (EF BF BD)。

我发现读取的字节总数大于原始文件。所以问题是从sysread开始的。

文本文件上传良好。

更新：传输的文件的前几个字节有一个二进制表示：

0000000: 504b 0304 1400 0000 0800 efbf bd1c efbf  PK..............
0000010: bd3e efbf bd1d 3aef bfbd efbf bd02 0000  .>....:.........
0000020: efbf bd05 0000 0500 1c00 422e 786d 6c55  ..........B.xmlU
0000030: 5409 0003 5cef bfbd efbf bd4d 18ef bfbd  T...\......M....
0000040: efbf bd4d 7578 0b00 0104 efbf bd03 0000  ...Mux..........
0000050: 0404 0000 00ef bfbd efbf bdef bfbd 6bef  ..............k.

和原始的：

0000000: 504b 0304 1400 0000 0800 b81c d33e df1d  PK...........>..
0000010: 3aa0 8102 0000 a405 0000 0500 1c00 422e  :.............B.
0000020: 786d 6c55 5409 0003 5cd4 fc4d 18c7 fc4d  xmlUT...\..M...M
0000030: 7578 0b00 0104 e803 0000 0404 0000 008d  ux..............
0000040: 94df 6bdb 3010 c7df 03f9 1f0e e1bd 254e  ..k.0.........%N
0000050: ec74 6c85 d825 2bac 9442 379a c25e ca8a  .tl..%+..B7..^..

Update2 运行软件为centos 5.6、perl 5.8.8、apache 2.2.3

原文

I have a big piece of production code, that works. But after I setup a new environment in virtual machine I have one issue -- everytime I need to upload a binary file it become messed up with unicode conversions.

So there is a sub, where issue is:

sub save_uploaded_file
{
    # $file is obtained by param(zip) 
    my ($file) = @_;
    my ($fh, $fname) = tmpnam;
    my ($br, $buffer);
    # commenting out next 2 lines doesn't help either
    binmode $file, ':raw';
    binmode $fh, ':raw';
    while ($br = sysread($file, $buffer, 16384))
    {
        syswrite($fh, $buffer, $br);
    }
    close $fh;
    return $fname;
}

Its used to upload zip archives, but they are uploaded as malformed (their size is always bigger than in original) and I looked inside of them with hex editor and found that there are lots unicode replacement charaters, encoded in utf-8, inside (EF BF BD).

I figured out that the total sum of bytes read is bigger than original file. So the problem starts at sysread.

Text files uploads well.

Update:
There is a binary representation of first few bytes of transfered file:

0000000: 504b 0304 1400 0000 0800 efbf bd1c efbf  PK..............
0000010: bd3e efbf bd1d 3aef bfbd efbf bd02 0000  .>....:.........
0000020: efbf bd05 0000 0500 1c00 422e 786d 6c55  ..........B.xmlU
0000030: 5409 0003 5cef bfbd efbf bd4d 18ef bfbd  T...\......M....
0000040: efbf bd4d 7578 0b00 0104 efbf bd03 0000  ...Mux..........
0000050: 0404 0000 00ef bfbd efbf bdef bfbd 6bef  ..............k.

And the original one:

0000000: 504b 0304 1400 0000 0800 b81c d33e df1d  PK...........>..
0000010: 3aa0 8102 0000 a405 0000 0500 1c00 422e  :.............B.
0000020: 786d 6c55 5409 0003 5cd4 fc4d 18c7 fc4d  xmlUT...\..M...M
0000030: 7578 0b00 0104 e803 0000 0404 0000 008d  ux..............
0000040: 94df 6bdb 3010 c7df 03f9 1f0e e1bd 254e  ..k.0.........%N
0000050: ec74 6c85 d825 2bac 9442 379a c25e ca8a  .tl..%+..B7..^..

Update2
The running software is centos 5.6, perl 5.8.8, apache 2.2.3

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

迷爱 2024-11-23 06:10:53

tmpnam 是否返回标记为 utf8 的文件句柄？我认为不是！

尝试 binmode $fh, ":utf8" ;

回复收藏 0 原文

白芷 2024-11-23 06:10:53

sysread 正在以 utf8 格式读取文件，但该文件不是 utf8！前十个字节位于“基本拉丁范围”（00-7F）中，因此它们被解释为相同的字节。下一个字节“b8”不在有效范围内，它被“efbfbd”替换<=> \x{FFFD}（表示解码错误的特殊字符）。
所有大于 7F 的字节都将被替换为 \x{FFFD}。

您使用什么 perl 版本和操作系统？
有一个报告（perl bug 75106），标题为 binmode $fh, ":raw" does not undo :utf8 on win32！

回复收藏 0 原文

千と千尋 2024-11-23 06:10:53

据我所知，Perl 5 不会在任何 io 层中交换替换字符。我所知道的唯一转换是换行符转换（即文本层）。您确定源文件不包含这些字节序列吗？

这段代码对我有用，对你有用吗？

#!/usr/bin/perl

use strict;
use warnings;

use File::Temp qw/:POSIX/;

sub save_uploaded_file {
    # $file is obtained by param(zip) 
    my ($file) = @_;
    my ($fh, $fname) = tmpnam;
    my ($br, $buffer);
    # commenting out next 2 lines doesn't help either
    binmode $file, ':raw'
        or die "could not change input file to raw: $!";
    binmode $fh, ':raw'
        or die "could not change tempfile to raw: $!";
    while ($br = sysread($file, $buffer, 16384)) {
        syswrite($fh, $buffer, $br);
    }
    close $fh
        or die "could not close tempfile: $!";
    return $fname;
}

sub check {
    my $input_file = shift;

    print "$input_file is ", -s $input_file, " bytes long\n"; 

    open my $fh, "<:raw", $input_file
        or die "could not open $input_file for reading: $!";

    my $bytes = sysread $fh, my $buf, 4096;

    print "read $bytes bytes: ", 
        join(", ", map { sprintf "%02x", $_ } unpack "C*", $buf),
        "\n";
}

my $input_file = "test.bin";

open my $fh, ">:raw", $input_file
    or die "could not open $input_file for writing: $!";

print $fh pack "CC", 0xFF, 0xFD
    or die "could not write to $input_file: $!";

close $fh
    or die "could not close $input_file: $!";

check $input_file;

open my $newfh, "<", $input_file
    or die "could not open $input_file: $!";
my $new_file = save_uploaded_file $newfh;

check $new_file;

As far as I know, Perl 5 doesn't swap in the replacement character in any of its io layers. They only conversions I am aware of are newline conversions (i.e. the text layer). Are you certain the source file does not contain those byte sequences?

This code works for me, does it work for you?

#!/usr/bin/perl

use strict;
use warnings;

use File::Temp qw/:POSIX/;

sub save_uploaded_file {
    # $file is obtained by param(zip) 
    my ($file) = @_;
    my ($fh, $fname) = tmpnam;
    my ($br, $buffer);
    # commenting out next 2 lines doesn't help either
    binmode $file, ':raw'
        or die "could not change input file to raw: $!";
    binmode $fh, ':raw'
        or die "could not change tempfile to raw: $!";
    while ($br = sysread($file, $buffer, 16384)) {
        syswrite($fh, $buffer, $br);
    }
    close $fh
        or die "could not close tempfile: $!";
    return $fname;
}

sub check {
    my $input_file = shift;

    print "$input_file is ", -s $input_file, " bytes long\n"; 

    open my $fh, "<:raw", $input_file
        or die "could not open $input_file for reading: $!";

    my $bytes = sysread $fh, my $buf, 4096;

    print "read $bytes bytes: ", 
        join(", ", map { sprintf "%02x", $_ } unpack "C*", $buf),
        "\n";
}

my $input_file = "test.bin";

open my $fh, ">:raw", $input_file
    or die "could not open $input_file for writing: $!";

print $fh pack "CC", 0xFF, 0xFD
    or die "could not write to $input_file: $!";

close $fh
    or die "could not close $input_file: $!";

check $input_file;

open my $newfh, "<", $input_file
    or die "could not open $input_file: $!";
my $new_file = save_uploaded_file $newfh;

check $new_file;

回复收藏 0 原文