使用utf8编码的Perl脚本,它可以打开GB2312编码的文件名吗?
我不是在谈论以 utf-8 或非 utf-8 编码读取文件内容之类的内容。这是关于文件名的。通常我将 Perl 脚本保存为系统默认编码,在我的例子中为“GB2312”,并且不会出现任何文件打开问题。但出于处理目的,我现在有一些以 utf-8 编码保存的 Perl 脚本文件。问题是:这些脚本无法打开名称由“GB2312”编码的字符组成的文件,并且我不喜欢必须重命名文件的想法。
有没有人有处理这种情况的经验?一如既往地感谢您的指导。
编辑
这是最小化的代码来演示我的问题:
# I'm running ActivePerl 5.10.1 on Windows XP (Simplified Chinese version)
# The file system is NTFS
#!perl -w
use autodie;
my $file = "./测试.txt"; #the file name consists of two Chinese characters
open my $in,'<',"$file";
while (<$in>){
print;
}
如果以“ANSI”编码保存,此测试脚本可以很好地运行(我假设 ANSI 编码与 GB2312 相同,用于显示中文字符)。但如果保存为“UTF-8”则不行,错误信息如下:
Can't open './娴嬭瘯.txt' for reading: 'No such file or directory'.
在这个警告信息中,“娴嬭瘯”是无意义的垃圾字符。
更新
我首先尝试将文件名编码为 GB2312,但它似乎不起作用:( 这是我尝试过的:
#!perl -w
use autodie;
use Encode;
my $file = "./测试.txt";
encode("gb2312", decode("utf-8", $file));
open my $in,'<',"$file";
while (<$in>){
print;
}
我目前的想法是:我的操作系统中的文件名是测试.txt,但它的编码为GB2312。在 Perl 脚本中,文件名在人眼看来是相同的,仍然是 Test.txt。但对于 Perl 来说,它们是不同的,因为它们有不同的内部表示。但我不明白为什么当我已经将 Perl 中的文件名转换为 GB2312 时问题仍然存在,如上面的代码所示。
更新
我成功了,终于成功了:)
@brian 的建议是正确的。我在上面的代码中犯了一个错误。我没有将编码后的文件名返回给 $file。
这是解决方案:
#!perl -w
use autodie;
use Encode;
my $file = "./测试.txt";
$file = encode("gb2312", decode("utf-8", $file));
open my $in,'<',"$file";
while (<$in>){
print;
}
I'm not talking about reading in the file content in utf-8 or non-utf-8 encoding and stuff. It's about file names. Usually I save my Perl script in the system default encoding, "GB2312" in my case and I won't have any file open problems. But for processing purposes, I'm now having some Perl script files saved in utf-8 encoding. The problem is: these scripts cannot open the files whose names consist of characters encoded in "GB2312" encoding and I don't like the idea of having to rename my files.
Does anyone happen to have any experience in dealing with this kind of situation? Thanks like always for any guidance.
Edit
Here's the minimized code to demonstrate my problem:
# I'm running ActivePerl 5.10.1 on Windows XP (Simplified Chinese version)
# The file system is NTFS
#!perl -w
use autodie;
my $file = "./测试.txt"; #the file name consists of two Chinese characters
open my $in,'<',"$file";
while (<$in>){
print;
}
This test script can run well if saved in "ANSI" encoding (I assume ANSI encoding is the same as GB2312, which is used to display Chinese charcters). But it won't work if saved as "UTF-8" and the error message is as follows:
Can't open './娴嬭瘯.txt' for reading: 'No such file or directory'.
In this warning message, "娴嬭瘯" are meaningless junk characters.
Update
I tried first encoding the file name as GB2312 but it does not seem to work :(
Here's what I tried:
#!perl -w
use autodie;
use Encode;
my $file = "./测试.txt";
encode("gb2312", decode("utf-8", $file));
open my $in,'<',"$file";
while (<$in>){
print;
}
My current thinking is: the file name in my OS is 测试.txt but it is encoded as GB2312. In the Perl script the file name looks the same to human eyes, still 测试.txt. But to Perl, they are different because they have different internal representations. But I don't understand why the problem persists when I already converted my file name in Perl to GB2312 as shown in the above code.
Update
I made it, finally made it :)
@brian's suggestion is right. I made a mistake in the above code. I didn't give the encoded file name back to the $file.
Here's the solution:
#!perl -w
use autodie;
use Encode;
my $file = "./测试.txt";
$file = encode("gb2312", decode("utf-8", $file));
open my $in,'<',"$file";
while (<$in>){
print;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您
在 Perl 脚本中,这只是告诉 Perl 源代码是 UTF-8。它不影响 perl 与外界打交道的方式。您是否打开了任何其他 Perl Unicode 功能?
您是否对每个文件名都有问题,或者只是其中一些文件名有问题?您能给我们一些例子,或者一个小的演示脚本吗?我没有将名称编码为 GB2312 的文件系统,但是您是否尝试在调用 open 之前将文件名编码为 GB2312 ?
如果您想要使用特定编码对特定字符串进行编码,可以使用 Encode 模块。尝试使用您为
open
指定的文件名。If you
in your Perl script, that merely tells perl that the source is in UTF-8. It doesn't affect how perl deals with the outside world. Are you turning on any other Perl Unicode features?
Are you having problems with every filename, or just some of them? Can you give us some examples, or a small demonstration script? I don't have a filesystem that encodes names as GB2312, but have you tried encoding your filenames as GB2312 before you call open?
If you want specific strings encoded with a specific encoding, you can use the Encode module. Try that with your filenames that you give to
open
.