如何使用 Perl 打开 Unicode 文件?
我正在使用 osql 对数据库运行多个 sql 脚本,然后我需要查看结果文件以检查是否发生任何错误。问题是 Perl 似乎不喜欢结果文件是 Unicode 的事实。
我写了一个小测试脚本来测试它,输出全是颤音:
$file = shift;
open OUTPUT, $file or die "Can't open $file: $!\n";
while (<OUTPUT>) {
print $_;
if (/Invalid|invalid|Cannot|cannot/) {
push(@invalids, $file);
print "invalid file - $inputfile - schedule for retry\n";
last;
}
}
有什么想法吗?我尝试使用 decode_utf8
进行解码,但没有什么区别。我还尝试在打开文件时设置编码。
我认为问题可能是 osql 将结果文件采用 UTF-16 格式,但我不确定。当我在文本板中打开文件时,它只会告诉我“Unicode”。
编辑:使用 perl v5.8.8 编辑:十六进制转储:
file name: Admin_CI.User.sql.results
mime type:
0000-0010: ff fe 31 00-3e 00 20 00-32 00 3e 00-20 00 4d 00 ..1.>... 2.>...M.
0000-0020: 73 00 67 00-20 00 31 00-35 00 30 00-30 00 37 00 s.g...1. 5.0.0.7.
0000-0030: 2c 00 20 00-4c 00 65 00-76 00 65 00-6c 00 20 00 ,...L.e. v.e.l...
0000-0032: 31 00 1.
I'm using osql to run several sql scripts against a database and then I need to look at the results file to check if any errors occurred. The problem is that Perl doesn't seem to like the fact that the results files are Unicode.
I wrote a little test script to test it and the output comes out all warbled:
$file = shift;
open OUTPUT, $file or die "Can't open $file: $!\n";
while (<OUTPUT>) {
print $_;
if (/Invalid|invalid|Cannot|cannot/) {
push(@invalids, $file);
print "invalid file - $inputfile - schedule for retry\n";
last;
}
}
Any ideas? I've tried decoding using decode_utf8
but it makes no difference. I've also tried to set the encoding when opening the file.
I think the problem might be that osql puts the result file in UTF-16 format, but I'm not sure. When I open the file in textpad it just tells me 'Unicode'.
Edit: Using perl v5.8.8
Edit: Hex dump:
file name: Admin_CI.User.sql.results
mime type:
0000-0010: ff fe 31 00-3e 00 20 00-32 00 3e 00-20 00 4d 00 ..1.>... 2.>...M.
0000-0020: 73 00 67 00-20 00 31 00-35 00 30 00-30 00 37 00 s.g...1. 5.0.0.7.
0000-0030: 2c 00 20 00-4c 00 65 00-76 00 65 00-6c 00 20 00 ,...L.e. v.e.l...
0000-0032: 31 00 1.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
该文件可能位于 UCS2-LE 中(或 UTF-16 格式)。
打开此类文件进行阅读时,需要指定编码:
注意开头的
fffe
是物料清单。The file is presumably in UCS2-LE (or UTF-16 format).
When opening such file for reading, you need to specify the encoding:
Note that the
fffe
at the beginning is the BOM.答案在 open 的文档中,它还指向 perluniintro。 :)
您可以获得
perl
支持的编码名称列表:之后,您就可以找出文件编码是什么。这与打开任何编码与默认编码不同的文件的方式相同,无论该文件是否由 Unicode 定义。
我们在Effective Perl 编程中有一章详细介绍了这些细节。
The answer is in the documentation for open, which also points you to perluniintro. :)
You can get a list of the names of the encodings that your
perl
supports:After that, it's up to you to find out what the file encoding is. This is the same way you'd open any file with an encoding different than the default, whether it's one defined by Unicode or not.
We have a chapter in Effective Perl Programming that goes through the details.
尝试打开指定 IO 层的文件,例如:
有关更多信息,请参阅 perldoc open 。
Try opening the file with an IO layer specified, e.g. :
See perldoc open for more on this.