如何在 Perl 中编写 *文件名* 包含 utf8 字符的文件?

发布于 2024-12-01 17:17:52 字数 713 浏览 2 评论 0原文

我正在努力创建一个包含非 ASCII 字符的文件。

如果使用 0 作为参数调用以下脚本,则它可以正常工作,但在使用 1 调用时会终止。

错误消息为open: Invalid argument at C:\temp\filename.pl line 15。

脚本在cmd.exe内启动。

我希望它写入一个名称为(取决于参数)äöü.txtäöü☺.txt 的文件。但我无法创建包含笑脸的文件名。

use warnings;
use strict;

use Encode 'encode';

#   Text is stored in utf8 within *this* file.
use utf8;

my $with_smiley = $ARGV[0];

my $filename = 'äöü' . 
  ($with_smiley ? '☺' : '' ).
   '.txt';

open (my $fh, '>', encode('cp1252', $filename)) or die "open: $!";

print $fh "Filename: $filename\n";

close $fh;

我可能遗漏了一些对其他人来说显而易见的东西,但我找不到,所以我很感激任何解决这个问题的指示。

I am struggling creating a file that contains non-ascii characters.

The following script works fine, if it is called with 0 as parameter but dies when called with 1.

The error message is open: Invalid argument at C:\temp\filename.pl line 15.

The script is started within cmd.exe.

I expect it to write a file whose name is either (depending on the paramter) äöü.txt or äöü☺.txt. But I fail to create the filename containing a smiley.

use warnings;
use strict;

use Encode 'encode';

#   Text is stored in utf8 within *this* file.
use utf8;

my $with_smiley = $ARGV[0];

my $filename = 'äöü' . 
  ($with_smiley ? '☺' : '' ).
   '.txt';

open (my $fh, '>', encode('cp1252', $filename)) or die "open: $!";

print $fh "Filename: $filename\n";

close $fh;

I am probably missing something that is obvious to others, but I can't find, so I'd appreciate any pointer towards solving this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

ヤ经典坏疍 2024-12-08 17:17:52

首先,说“UTF-8字符”很奇怪。 UTF-8可以对任何Unicode字符进行编码,因此UTF-8字符集就是Unicode字符集。这意味着您想要创建名称包含 Unicode 字符的文件,更具体地说,是 cp1252 中不包含的 Unicode 字符。

我过去在 PerlMonks 上回答过这个问题。答案复制如下。


Perl 将文件名视为不透明的字节字符串。这意味着文件名需要根据您的“区域设置”的编码(ANSI 代码页)进行编码。

在Windows中,通常使用代码页1252,因此编码通常为< code>cp1252。* 但是,cp1252 不支持泰米尔语和印地语字符[或“☺”]。

Windows 还提供了“Unicode”又名“Wide”接口,但 Perl 不提供使用内置函数**对其进行访问。您可以使用 Win32API::FileCreateFileW , 尽管。 IIRC,您仍然需要自己对文件名进行编码。如果是这样,您可以使用 UTF-16le 作为编码。

前面提到的 Win32::Unicode 似乎可以处理使用 Win32API::File 为您服务。我也建议从那开始。

* — 代码页由 GetACP 系统调用返回(作为数字)。前面加上“cp”以获取编码。

** — Perl 对 Windows 的支持在某些方面很糟糕。

First of all, saying "UTF-8 character" is weird. UTF-8 can encode any Unicode character, so the UTF-8 character set is the Unicode character set. That means you want to create file whose name contain Unicode characters, and more specifically, Unicode characters that aren't in cp1252.

I've answered this on PerlMonks in the past. Answer copied below.


Perl treats file names as opaque strings of bytes. That means that file names need to be encoded as per your "locale"'s encoding (ANSI code page).

In Windows, code page 1252 is commonly used, and thus the encoding is usually cp1252.* However, cp1252 doesn't support Tamil and Hindi characters [or "☺"].

Windows also provides a "Unicode" aka "Wide" interface, but Perl doesn't provide access to it using builtins**. You can use Win32API::File's CreateFileW, though. IIRC, you need to still need to encode the file name yourself. If so, you'd use UTF-16le as the encoding.

Aforementioned Win32::Unicode appears to handle some of the dirty work of using Win32API::File for you. I'd also recommend starting with that.

* — The code page is returned (as a number) by the GetACP system call. Prepend "cp" to get the encoding.

** — Perl's support for Windows sucks in some respects.

南七夏 2024-12-08 17:17:52

以下运行在 Windows 7 ActiveState Perl 上。它将“hello There”写入名称中包含希伯来语字符的文件中:

#-----------------------------------------------------------------------
# Unicode file names on Windows using Perl
# Philip R Brenan at gmail dot com, Appa Apps Ltd, 2013
#-----------------------------------------------------------------------

use feature ":5.16";
use Data::Dump qw(dump);
use Encode qw/encode decode/;
use Win32API::File qw(:ALL);

# Create a file with a unicode name

my $e  = "\x{05E7}\x{05EA}\x{05E7}\x{05D5}\x{05D5}\x{05D4}".
         "\x{002E}\x{0064}\x{0061}\x{0074}\x{0061}"; # File name in UTF-8
my $f  = encode("UTF-16LE", $e);  # Format supported by NTFS
my $g  = eval dump($f);           # Remove UTF ness
   $g .= chr(0).chr(0);           # 0 terminate string
my $F  = Win32API::File::CreateFileW
 ($g, GENERIC_WRITE, 0, [], OPEN_ALWAYS, 0, 0); #  Create file via Win32API
say $^E if $^E;                   # Write any error message

# Write to the file

OsFHandleOpen(FILE, $F, "w") or die "Cannot open file";
binmode FILE;                      
print FILE "hello there\n";      
close(FILE);

The following runs on Windows 7, ActiveState Perl. It writes "hello there" to a file with hebrew characters in its name:

#-----------------------------------------------------------------------
# Unicode file names on Windows using Perl
# Philip R Brenan at gmail dot com, Appa Apps Ltd, 2013
#-----------------------------------------------------------------------

use feature ":5.16";
use Data::Dump qw(dump);
use Encode qw/encode decode/;
use Win32API::File qw(:ALL);

# Create a file with a unicode name

my $e  = "\x{05E7}\x{05EA}\x{05E7}\x{05D5}\x{05D5}\x{05D4}".
         "\x{002E}\x{0064}\x{0061}\x{0074}\x{0061}"; # File name in UTF-8
my $f  = encode("UTF-16LE", $e);  # Format supported by NTFS
my $g  = eval dump($f);           # Remove UTF ness
   $g .= chr(0).chr(0);           # 0 terminate string
my $F  = Win32API::File::CreateFileW
 ($g, GENERIC_WRITE, 0, [], OPEN_ALWAYS, 0, 0); #  Create file via Win32API
say $^E if $^E;                   # Write any error message

# Write to the file

OsFHandleOpen(FILE, $F, "w") or die "Cannot open file";
binmode FILE;                      
print FILE "hello there\n";      
close(FILE);
过期以后 2024-12-08 17:17:52

不需要对文件名进行编码(至少在 Linux 上不需要)。这段代码适用于我的 Linux 系统:

use warnings;
use strict;

#   Text is stored in utf8 within *this* file.
use utf8;

my $with_smiley = $ARGV[0] || 0;

my $filename = 'äöü' .
  ($with_smiley ? '?' : '' ).
     '.txt';

open my $fh, '>', $filename or die "open: $!";

binmode $fh, ':utf8';

print $fh "Filename: $filename\n";

close $fh;

HTH,Paul

no need to encode the filename (at least not on linux). This code works on my linux system:

use warnings;
use strict;

#   Text is stored in utf8 within *this* file.
use utf8;

my $with_smiley = $ARGV[0] || 0;

my $filename = 'äöü' .
  ($with_smiley ? '?' : '' ).
     '.txt';

open my $fh, '>', $filename or die "open: $!";

binmode $fh, ':utf8';

print $fh "Filename: $filename\n";

close $fh;

HTH, Paul

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文