如何在 ActiveState Perl 中获取正确的非 ASCII 命令行参数?
使用 ActiveState Perl v5.14.2 在 Windows 7 cmd 窗口上运行以下命令
perl -e "for (my $i = 0; $i < length($ARGV[0]); $i++) {print ord(substr($ARGV[0], $i, 1)), qq{\n}; }" αβγδεζ
会产生以下结果:
97
223
63
100
101
63
上述值是无意义的,并且不对应于任何已知的编码,因此尝试使用中推荐的方法对它们进行解码 如何处理命令行参数作为 Perl 中的 UTF-8? 没有帮助。更改命令窗口活动代码页不会更改结果。
Running the following command
perl -e "for (my $i = 0; $i < length($ARGV[0]); $i++) {print ord(substr($ARGV[0], $i, 1)), qq{\n}; }" αβγδεζ
on a Windows 7 cmd window with ActiveState Perl v5.14.2 produces the following result:
97
223
63
100
101
63
The above values are nonsensical and don't correspond to any known encoding, so trying to decode them with the approach recommended in
How can I treat command-line arguments as UTF-8 in Perl? doesn't help. Changing the command window active code page doesn't change the results.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
你的系统,就像我知道的每个 Windows 系统一样,默认使用 1252 ANSI 代码页,所以你可以尝试使用
注意 cp1252 不能代表所有这些字符,这就是为什么控制台和 Perl 实际上收到
有一个“Wide”接口,用于将(几乎)任何 Unicode 代码点传递给程序,但是
抱歉,这是一种“你不能”的情况。你需要一种不同的方法。 Diomidis Spinellis 建议在 Win7 中按以下方式更改系统的 ANSI 代码页:
此时,您将使用与新选择的编码关联的 ANSI 代码页编码,而不是
cp1252
(cp1253
代表希腊语)。请注意,使用 chcp 修改控制台窗口中使用的代码页不会影响 Perl 接收其参数的代码页,该代码页始终是 ANSI 代码页。请参阅下面的示例(cp737 是希腊语 OEM 代码页,cp1253 是希腊语 < a href="http://en.wikipedia.org/wiki/Windows_code_page#ANSI_code_page" rel="nofollow">ANSI 代码页。 本文档中的 37 和 M7。)
Your system, like every Windows system I know, uses by default the 1252 ANSI code page, so you could try to use
Note that cp1252 cannot represent all of those characters, which is why the console and thus Perl actually receives
There is a "Wide" interface for passing (almost) any Unicode code point to a program, but
Sorry, but this is a "you can't" type of situation. You need a different approach. Diomidis Spinellis suggests changing your system's ANSI code page as follows in Win7:
At this point, you'd use the encoding of the ANSI code page associated with the new selected encoding instead of
cp1252
(cp1253
for Greek).Note that using
chcp
to modify the code page used within the console window does not affect the code page in which Perl receives its arguments, which is always an ANSI code page. See the examples below (cp737 is the Greek OEM code page, and cp1253 is the Greek ANSI code page. You can find the encodings labeled as 37 and M7 in this document.)这对我有用(在 OS-X 上,但应该是可移植的):
那是针对 STDIN 的;对于 ARGV:
请参阅 perlrun 中的
-C
选项: http:// /perldoc.perl.org/perlrun.html#命令开关This worked for me (on OS-X, but should be portable):
That was for STDIN; for ARGV:
See the
-C
option in perlrun: http://perldoc.perl.org/perlrun.html#Command-Switches如果我将字符放入文件中(来自 OS-X),将其复制到 Windows 盒子(如
file.txt
),然后运行:然后我得到预期的结果:
但是如果我复制内容将
file.txt
添加到命令行,我得到了乱码。正如 @ikegami 所说,我认为不可能从命令行执行此操作,因为您没有 UTF-8 语言环境。
If I place the characters in a file (from OS-X), copy it to a windows box (as
file.txt
), then run:Then I get the expected:
But if I copy the contents of
file.txt
to the command line, I get gibberish.As @ikegami was saying, I don't think it's possible to do from command line since you don't have a UTF-8 locale.
您可以尝试使用 https://metacpan.org/pod/Win32::Unicode::Native 。它应该有你需要的东西。
You could try using https://metacpan.org/pod/Win32::Unicode::Native. It should have what you need.