如何在 Perl 中编写 *文件名* 包含 utf8 字符的文件?
我正在努力创建一个包含非 ASCII 字符的文件。
如果使用 0
作为参数调用以下脚本,则它可以正常工作,但在使用 1
调用时会终止。
错误消息为open: Invalid argument at C:\temp\filename.pl line 15。
脚本在cmd.exe
内启动。
我希望它写入一个名称为(取决于参数)äöü.txt
或 äöü☺.txt
的文件。但我无法创建包含笑脸的文件名。
use warnings;
use strict;
use Encode 'encode';
# Text is stored in utf8 within *this* file.
use utf8;
my $with_smiley = $ARGV[0];
my $filename = 'äöü' .
($with_smiley ? '☺' : '' ).
'.txt';
open (my $fh, '>', encode('cp1252', $filename)) or die "open: $!";
print $fh "Filename: $filename\n";
close $fh;
我可能遗漏了一些对其他人来说显而易见的东西,但我找不到,所以我很感激任何解决这个问题的指示。
I am struggling creating a file that contains non-ascii characters.
The following script works fine, if it is called with 0
as parameter but dies when called with 1
.
The error message is open: Invalid argument at C:\temp\filename.pl line 15.
The script is started within cmd.exe
.
I expect it to write a file whose name is either (depending on the paramter) äöü.txt
or äöü☺.txt
. But I fail to create the filename containing a smiley.
use warnings;
use strict;
use Encode 'encode';
# Text is stored in utf8 within *this* file.
use utf8;
my $with_smiley = $ARGV[0];
my $filename = 'äöü' .
($with_smiley ? '☺' : '' ).
'.txt';
open (my $fh, '>', encode('cp1252', $filename)) or die "open: $!";
print $fh "Filename: $filename\n";
close $fh;
I am probably missing something that is obvious to others, but I can't find, so I'd appreciate any pointer towards solving this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
首先,说“UTF-8字符”很奇怪。 UTF-8可以对任何Unicode字符进行编码,因此UTF-8字符集就是Unicode字符集。这意味着您想要创建名称包含 Unicode 字符的文件,更具体地说,是 cp1252 中不包含的 Unicode 字符。
我过去在 PerlMonks 上回答过这个问题。答案复制如下。
Perl 将文件名视为不透明的字节字符串。这意味着文件名需要根据您的“区域设置”的编码(ANSI 代码页)进行编码。
在Windows中,通常使用代码页
1252
,因此编码通常为< code>cp1252。* 但是,cp1252
不支持泰米尔语和印地语字符[或“☺”]。Windows 还提供了“Unicode”又名“Wide”接口,但 Perl 不提供使用内置函数**对其进行访问。您可以使用 Win32API::File 的
CreateFileW
, 尽管。 IIRC,您仍然需要自己对文件名进行编码。如果是这样,您可以使用UTF-16le
作为编码。前面提到的 Win32::Unicode 似乎可以处理使用 Win32API::File 为您服务。我也建议从那开始。
* — 代码页由
GetACP
系统调用返回(作为数字)。前面加上“cp
”以获取编码。** — Perl 对 Windows 的支持在某些方面很糟糕。
First of all, saying "UTF-8 character" is weird. UTF-8 can encode any Unicode character, so the UTF-8 character set is the Unicode character set. That means you want to create file whose name contain Unicode characters, and more specifically, Unicode characters that aren't in cp1252.
I've answered this on PerlMonks in the past. Answer copied below.
Perl treats file names as opaque strings of bytes. That means that file names need to be encoded as per your "locale"'s encoding (ANSI code page).
In Windows, code page
1252
is commonly used, and thus the encoding is usuallycp1252
.* However,cp1252
doesn't support Tamil and Hindi characters [or "☺"].Windows also provides a "Unicode" aka "Wide" interface, but Perl doesn't provide access to it using builtins**. You can use Win32API::File's
CreateFileW
, though. IIRC, you need to still need to encode the file name yourself. If so, you'd useUTF-16le
as the encoding.Aforementioned Win32::Unicode appears to handle some of the dirty work of using Win32API::File for you. I'd also recommend starting with that.
* — The code page is returned (as a number) by the
GetACP
system call. Prepend "cp
" to get the encoding.** — Perl's support for Windows sucks in some respects.
以下运行在 Windows 7 ActiveState Perl 上。它将“hello There”写入名称中包含希伯来语字符的文件中:
The following runs on Windows 7, ActiveState Perl. It writes "hello there" to a file with hebrew characters in its name:
不需要对文件名进行编码(至少在 Linux 上不需要)。这段代码适用于我的 Linux 系统:
HTH,Paul
no need to encode the filename (at least not on linux). This code works on my linux system:
HTH, Paul