如何使用 CGI.pm 在 mod_perl 中上传二进制文件?
我有一大段生产代码,可以工作。但是当我在虚拟机中设置了一个新环境后,我遇到了一个问题——每次我需要上传一个二进制文件时,它就会因为 unicode 转换而变得混乱。
所以有一个 sub,问题是:
sub save_uploaded_file
{
# $file is obtained by param(zip)
my ($file) = @_;
my ($fh, $fname) = tmpnam;
my ($br, $buffer);
# commenting out next 2 lines doesn't help either
binmode $file, ':raw';
binmode $fh, ':raw';
while ($br = sysread($file, $buffer, 16384))
{
syswrite($fh, $buffer, $br);
}
close $fh;
return $fname;
}
它用于上传 zip 档案,但它们上传时格式错误(它们的大小总是比原始大小大),我用十六进制编辑器查看它们内部,发现有很多 unicode 替换字符,以 utf-8 编码,内部 (EF BF BD)。
我发现读取的字节总数大于原始文件。所以问题是从sysread开始的。
文本文件上传良好。
更新: 传输的文件的前几个字节有一个二进制表示:
0000000: 504b 0304 1400 0000 0800 efbf bd1c efbf PK..............
0000010: bd3e efbf bd1d 3aef bfbd efbf bd02 0000 .>....:.........
0000020: efbf bd05 0000 0500 1c00 422e 786d 6c55 ..........B.xmlU
0000030: 5409 0003 5cef bfbd efbf bd4d 18ef bfbd T...\......M....
0000040: efbf bd4d 7578 0b00 0104 efbf bd03 0000 ...Mux..........
0000050: 0404 0000 00ef bfbd efbf bdef bfbd 6bef ..............k.
和原始的:
0000000: 504b 0304 1400 0000 0800 b81c d33e df1d PK...........>..
0000010: 3aa0 8102 0000 a405 0000 0500 1c00 422e :.............B.
0000020: 786d 6c55 5409 0003 5cd4 fc4d 18c7 fc4d xmlUT...\..M...M
0000030: 7578 0b00 0104 e803 0000 0404 0000 008d ux..............
0000040: 94df 6bdb 3010 c7df 03f9 1f0e e1bd 254e ..k.0.........%N
0000050: ec74 6c85 d825 2bac 9442 379a c25e ca8a .tl..%+..B7..^..
Update2 运行软件为centos 5.6、perl 5.8.8、apache 2.2.3
I have a big piece of production code, that works. But after I setup a new environment in virtual machine I have one issue -- everytime I need to upload a binary file it become messed up with unicode conversions.
So there is a sub, where issue is:
sub save_uploaded_file
{
# $file is obtained by param(zip)
my ($file) = @_;
my ($fh, $fname) = tmpnam;
my ($br, $buffer);
# commenting out next 2 lines doesn't help either
binmode $file, ':raw';
binmode $fh, ':raw';
while ($br = sysread($file, $buffer, 16384))
{
syswrite($fh, $buffer, $br);
}
close $fh;
return $fname;
}
Its used to upload zip archives, but they are uploaded as malformed (their size is always bigger than in original) and I looked inside of them with hex editor and found that there are lots unicode replacement charaters, encoded in utf-8, inside (EF BF BD).
I figured out that the total sum of bytes read is bigger than original file. So the problem starts at sysread.
Text files uploads well.
Update:
There is a binary representation of first few bytes of transfered file:
0000000: 504b 0304 1400 0000 0800 efbf bd1c efbf PK..............
0000010: bd3e efbf bd1d 3aef bfbd efbf bd02 0000 .>....:.........
0000020: efbf bd05 0000 0500 1c00 422e 786d 6c55 ..........B.xmlU
0000030: 5409 0003 5cef bfbd efbf bd4d 18ef bfbd T...\......M....
0000040: efbf bd4d 7578 0b00 0104 efbf bd03 0000 ...Mux..........
0000050: 0404 0000 00ef bfbd efbf bdef bfbd 6bef ..............k.
And the original one:
0000000: 504b 0304 1400 0000 0800 b81c d33e df1d PK...........>..
0000010: 3aa0 8102 0000 a405 0000 0500 1c00 422e :.............B.
0000020: 786d 6c55 5409 0003 5cd4 fc4d 18c7 fc4d xmlUT...\..M...M
0000030: 7578 0b00 0104 e803 0000 0404 0000 008d ux..............
0000040: 94df 6bdb 3010 c7df 03f9 1f0e e1bd 254e ..k.0.........%N
0000050: ec74 6c85 d825 2bac 9442 379a c25e ca8a .tl..%+..B7..^..
Update2
The running software is centos 5.6, perl 5.8.8, apache 2.2.3
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
tmpnam
是否返回标记为 utf8 的文件句柄?我认为不是!尝试 binmode $fh, ":utf8" ;
Does
tmpnam
returns a filehandle marked as utf8? I think not!try
binmode $fh, ":utf8" ;
sysread 正在以 utf8 格式读取文件,但该文件不是 utf8!前十个字节位于“基本拉丁范围”(00-7F)中,因此它们被解释为相同的字节。下一个字节“b8”不在有效范围内,它被“efbfbd”替换<=> \x{FFFD}(表示解码错误的特殊字符)。
所有大于 7F 的字节都将被替换为 \x{FFFD}。
您使用什么 perl 版本和操作系统?
有一个报告(perl bug 75106),标题为
binmode $fh, ":raw" does not undo :utf8 on win32
!sysread is reading the file as utf8, but the file is not utf8! the first ten bytes are in "the basic latin range" (00-7F) so they are interpreted as the same byte. The next byte 'b8' is not in the valid range and its being replaced by 'efbfbd' <=> \x{FFFD} (a special char to indicate a decoding error).
All the bytes greater than 7F are being replaced by \x{FFFD}.
What perl version and OS are you using?
There is a report (perl bug 75106) with title
binmode $fh, ":raw" doesn't undo :utf8 on win32
!据我所知,Perl 5 不会在任何 io 层中交换替换字符。我所知道的唯一转换是换行符转换(即文本层)。您确定源文件不包含这些字节序列吗?
这段代码对我有用,对你有用吗?
As far as I know, Perl 5 doesn't swap in the replacement character in any of its io layers. They only conversions I am aware of are newline conversions (i.e. the text layer). Are you certain the source file does not contain those byte sequences?
This code works for me, does it work for you?
我有我认为是同样的问题。该错误似乎很早就发生了,因为当客户端尝试加载二进制文件时,我的代码都没有执行。我通过在脚本顶部将 STDIN 设置为“raw”(二进制)来修复它...
binmode(STDIN, ':raw') ;
I had what I think is the same problem. The error seemed to be occurring very early, because none of my code ever executed when client attempted to load a binary file. I fixed it by setting STDIN to "raw" (binary), at the top of the script…
binmode(STDIN, ':raw') ;