如何在Windows机器上的perl脚本中将Unicode文件转换为ASCII文件

发布于 2024-12-15 17:41:59 字数 97 浏览 0 评论 0原文

我在 Windows 机器上有一个 Unicode 格式的文件。有没有办法使用perl脚本在Windows机器上将其转换为ASCII格式

它是UTF-16 BOM。

I have a file in Unicode format on a windows machine. Is there any way to convert it to ASCII format on a windows machine using perl script

It's UTF-16 BOM.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

李不 2024-12-22 17:42:01

查看 Perl open 命令上的编码选项。您可以在打开文件进行读取或写入时指定编码:

如下所示:

#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say switch);
use Data::Dumper;

use autodie;

open (my $utf16_fh, "<:encoding(UTF-16BE)", "test.utf16.txt");
open (my $ascii_fh, ">:encoding(ASCII)", ".gvimrc");

while (my $line = <$utf16_fh>) {
    print $ascii_fh $line;
}

close $utf16_fh;
close $ascii_fh;

Take a look at the encoding option on the Perl open command. You can specify the encoding when opening a file for reading or writing:

It'd be something like this would work:

#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say switch);
use Data::Dumper;

use autodie;

open (my $utf16_fh, "<:encoding(UTF-16BE)", "test.utf16.txt");
open (my $ascii_fh, ">:encoding(ASCII)", ".gvimrc");

while (my $line = <$utf16_fh>) {
    print $ascii_fh $line;
}

close $utf16_fh;
close $ascii_fh;
深海夜未眠 2024-12-22 17:41:59

如果你想将unicode转换为ascii,你必须意识到有些字符是无法转换的,因为它们在ascii中不存在。
如果您可以忍受这一点,您可以尝试以下操作:

#!/usr/bin/env perl
use strict;
use warnings;
use autodie;

use open IN => ':encoding(UTF-16)';
use open OUT => ':encoding(ascii)';

my $buffer;

open(my $ifh, '<', 'utf16bom.txt');
read($ifh, $buffer, -s $ifh);
close($ifh);

open(my $ofh, '>', 'ascii.txt');
print($ofh $buffer);
close($ofh);

如果您没有 autodie,只需删除该行 - 然后您应该使用 a 更改您的打开/关闭语句

open(...) or die "error: $!\n";

如果您有无法转换的字符,您将收到警告在控制台上,您的输出文件将包含例如类似的文本

\x{00e4}\x{00f6}\x{00fc}\x{00df}


BTW:如果你没有妈妈但知道它是Big Endian(Little Endian),你可以将编码行更改为

use open IN => ':encoding(UTF-16BE)';

use open IN => ':encoding(UTF-16LE)';

希望它在Windows 下也能工作。我现在不能尝试。

If you want to convert unicode to ascii, you must be aware that some characters can't be converted, because they just don't exist in ascii.
If you can live with that, you can try this:

#!/usr/bin/env perl
use strict;
use warnings;
use autodie;

use open IN => ':encoding(UTF-16)';
use open OUT => ':encoding(ascii)';

my $buffer;

open(my $ifh, '<', 'utf16bom.txt');
read($ifh, $buffer, -s $ifh);
close($ifh);

open(my $ofh, '>', 'ascii.txt');
print($ofh $buffer);
close($ofh);

If you do not have autodie, just remove that line - you should then change your open/close statements with a

open(...) or die "error: $!\n";

If you have characters that can't be converted, you will get warnings on the console and your output file will have e.g. text like

\x{00e4}\x{00f6}\x{00fc}\x{00df}

in it.
BTW: If you don't have a mom but know it is Big Endian (Little Endian), you can change the encoding line to

use open IN => ':encoding(UTF-16BE)';

or

use open IN => ':encoding(UTF-16LE)';

Hope it works under Windows as well. I can't give it a try right now.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文