如何使用 Perl 打开 Unicode 文件?

发布于 2024-08-26 01:26:00 字数 999 浏览 6 评论 0原文

我正在使用 osql 对数据库运行多个 sql 脚本,然后我需要查看结果文件以检查是否发生任何错误。问题是 Perl 似乎不喜欢结果文件是 Unicode 的事实。

我写了一个小测试脚本来测试它,输出全是颤音:

$file = shift;

open OUTPUT, $file or die "Can't open $file: $!\n";
while (<OUTPUT>) {
    print $_;
    if (/Invalid|invalid|Cannot|cannot/) {
        push(@invalids, $file);
        print "invalid file - $inputfile - schedule for retry\n";
        last;
    }            
}

有什么想法吗?我尝试使用 decode_utf8 进行解码,但没有什么区别。我还尝试在打开文件时设置编码。

我认为问题可能是 osql 将结果文件采用 UTF-16 格式,但我不确定。当我在文本板中打开文件时,它只会告诉我“Unicode”。

编辑:使用 perl v5.8.8 编辑:十六进制转储:

file name: Admin_CI.User.sql.results
mime type: 

0000-0010:  ff fe 31 00-3e 00 20 00-32 00 3e 00-20 00 4d 00  ..1.>... 2.>...M.
0000-0020:  73 00 67 00-20 00 31 00-35 00 30 00-30 00 37 00  s.g...1. 5.0.0.7.
0000-0030:  2c 00 20 00-4c 00 65 00-76 00 65 00-6c 00 20 00  ,...L.e. v.e.l...
0000-0032:  31 00                                            1.

I'm using osql to run several sql scripts against a database and then I need to look at the results file to check if any errors occurred. The problem is that Perl doesn't seem to like the fact that the results files are Unicode.

I wrote a little test script to test it and the output comes out all warbled:

$file = shift;

open OUTPUT, $file or die "Can't open $file: $!\n";
while (<OUTPUT>) {
    print $_;
    if (/Invalid|invalid|Cannot|cannot/) {
        push(@invalids, $file);
        print "invalid file - $inputfile - schedule for retry\n";
        last;
    }            
}

Any ideas? I've tried decoding using decode_utf8 but it makes no difference. I've also tried to set the encoding when opening the file.

I think the problem might be that osql puts the result file in UTF-16 format, but I'm not sure. When I open the file in textpad it just tells me 'Unicode'.

Edit: Using perl v5.8.8
Edit: Hex dump:

file name: Admin_CI.User.sql.results
mime type: 

0000-0010:  ff fe 31 00-3e 00 20 00-32 00 3e 00-20 00 4d 00  ..1.>... 2.>...M.
0000-0020:  73 00 67 00-20 00 31 00-35 00 30 00-30 00 37 00  s.g...1. 5.0.0.7.
0000-0030:  2c 00 20 00-4c 00 65 00-76 00 65 00-6c 00 20 00  ,...L.e. v.e.l...
0000-0032:  31 00                                            1.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

温柔女人霸气范 2024-09-02 01:26:00

该文件可能位于 UCS2-LE 中(或 UTF-16 格式)。

C:\Temp> notepad test.txt

C:\Temp> xxd test.txt
0000000: fffe 5400 6800 6900 7300 2000 6900 7300  ..T.h.i.s. .i.s.
0000010: 2000 6100 2000 6600 6900 6c00 6500 2e00   .a. .f.i.l.e...

打开此类文件进行阅读时,需要指定编码:

#!/usr/bin/perl

use strict; use warnings;

my ($infile) = @ARGV;

open my $in, '<:encoding(UCS-2le)', $infile
    or die "Cannot open '$infile': $!";

注意开头的fffe物料清单

The file is presumably in UCS2-LE (or UTF-16 format).

C:\Temp> notepad test.txt

C:\Temp> xxd test.txt
0000000: fffe 5400 6800 6900 7300 2000 6900 7300  ..T.h.i.s. .i.s.
0000010: 2000 6100 2000 6600 6900 6c00 6500 2e00   .a. .f.i.l.e...

When opening such file for reading, you need to specify the encoding:

#!/usr/bin/perl

use strict; use warnings;

my ($infile) = @ARGV;

open my $in, '<:encoding(UCS-2le)', $infile
    or die "Cannot open '$infile': $!";

Note that the fffe at the beginning is the BOM.

恍梦境° 2024-09-02 01:26:00

答案在 open 的文档中,它还指向 perluniintro。 :)

open my $fh, '<:encoding(UTF-16LE)', $file or die ...;

您可以获得 perl 支持的编码名称列表:

% perl -MEncode -le "print for Encode->encodings(':all')"

之后,您就可以找出文件编码是什么。这与打开任何编码与默认编码不同的文件的方式相同,无论该文件是否由 Unicode 定义。

我们在Effective Perl 编程中有一章详细介绍了这些细节。

The answer is in the documentation for open, which also points you to perluniintro. :)

open my $fh, '<:encoding(UTF-16LE)', $file or die ...;

You can get a list of the names of the encodings that your perl supports:

% perl -MEncode -le "print for Encode->encodings(':all')"

After that, it's up to you to find out what the file encoding is. This is the same way you'd open any file with an encoding different than the default, whether it's one defined by Unicode or not.

We have a chapter in Effective Perl Programming that goes through the details.

兔小萌 2024-09-02 01:26:00

尝试打开指定 IO 层的文件,例如:

open OUTPUT,  "<:encoding(UTF-8)", $file or die "Can't open $file: $!\n";

有关更多信息,请参阅 perldoc open

Try opening the file with an IO layer specified, e.g. :

open OUTPUT,  "<:encoding(UTF-8)", $file or die "Can't open $file: $!\n";

See perldoc open for more on this.

指尖上得阳光 2024-09-02 01:26:00
    #
    # -----------------------------------------------------------------------------
    # Reads a file returns a sting , if second param is utf8 returns utf8 string
    # usage:
    # ( $ret , $msg , $str_file )
    #         = $objFileHandler->doReadFileReturnString ( $file , 'utf8' ) ;
    # or
    # ( $ret , $msg , $str_file )
    #         = $objFileHandler->doReadFileReturnString ( $file ) ;
    # -----------------------------------------------------------------------------
    sub doReadFileReturnString {

        my $self      = shift;
        my $file      = shift;
        my $mode      = shift ;

        my $msg        = {} ;
        my $ret        = 1 ;
        my $s          = q{} ;

        $msg = " the file : $file does not exist !!!" ;
        cluck ( $msg ) unless -e $file ;

        $msg = " the file : $file is not actually a file !!!" ;
        cluck ( $msg ) unless -f $file ;

        $msg = " the file : $file is not readable !!!" ;
        cluck ( $msg ) unless -r $file ;

        $msg .= "can not read the file $file !!!";

        return ( $ret , "$msg ::: $! !!!" , undef )
            unless ((-e $file) && (-f $file) && (-r $file));

        $msg = '' ;

        $s = eval {
             my $string = ();    #slurp the file
             {
                local $/ = undef;

                if ( defined ( $mode ) && $mode eq 'utf8' ) {
                    open FILE, "<:utf8", "$file "
                      or cluck("failed to open \$file $file : $!");
                    $string = <FILE> ;
                    die "did not find utf8 string in file: $file"
                        unless utf8::valid ( $string ) ;
                }
                else {
                    open FILE, "$file "
                      or cluck "failed to open \$file $file : $!" ;
                    $string = <FILE> ;
                }
                close FILE;

             }
            $string ;
         };

         if ( $@ ) {
            $msg = $! . " " . $@ ;
            $ret = 1 ;
            $s = undef ;
         } else {
            $ret = 0 ; $msg = "ok for read file: $file" ;
         }
         return ( $ret , $msg , $s ) ;
    }
    #eof sub doReadFileReturnString
    #
    # -----------------------------------------------------------------------------
    # Reads a file returns a sting , if second param is utf8 returns utf8 string
    # usage:
    # ( $ret , $msg , $str_file )
    #         = $objFileHandler->doReadFileReturnString ( $file , 'utf8' ) ;
    # or
    # ( $ret , $msg , $str_file )
    #         = $objFileHandler->doReadFileReturnString ( $file ) ;
    # -----------------------------------------------------------------------------
    sub doReadFileReturnString {

        my $self      = shift;
        my $file      = shift;
        my $mode      = shift ;

        my $msg        = {} ;
        my $ret        = 1 ;
        my $s          = q{} ;

        $msg = " the file : $file does not exist !!!" ;
        cluck ( $msg ) unless -e $file ;

        $msg = " the file : $file is not actually a file !!!" ;
        cluck ( $msg ) unless -f $file ;

        $msg = " the file : $file is not readable !!!" ;
        cluck ( $msg ) unless -r $file ;

        $msg .= "can not read the file $file !!!";

        return ( $ret , "$msg ::: $! !!!" , undef )
            unless ((-e $file) && (-f $file) && (-r $file));

        $msg = '' ;

        $s = eval {
             my $string = ();    #slurp the file
             {
                local $/ = undef;

                if ( defined ( $mode ) && $mode eq 'utf8' ) {
                    open FILE, "<:utf8", "$file "
                      or cluck("failed to open \$file $file : $!");
                    $string = <FILE> ;
                    die "did not find utf8 string in file: $file"
                        unless utf8::valid ( $string ) ;
                }
                else {
                    open FILE, "$file "
                      or cluck "failed to open \$file $file : $!" ;
                    $string = <FILE> ;
                }
                close FILE;

             }
            $string ;
         };

         if ( $@ ) {
            $msg = $! . " " . $@ ;
            $ret = 1 ;
            $s = undef ;
         } else {
            $ret = 0 ; $msg = "ok for read file: $file" ;
         }
         return ( $ret , $msg , $s ) ;
    }
    #eof sub doReadFileReturnString
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文