Perl 的 YAML::XS 和 unicode
我试图在 unicode 字母上使用 perl 的 YAML::XS
模块,但它似乎没有按应有的方式工作。
我在脚本中写了这个(以 utf-8 格式保存),
use utf8;
binmode STDOUT, ":utf8";
my $hash = {č => "ř"}; #czech letters with unicode codes U+010D and U+0159
use YAML::XS;
my $s = YAML::XS::Dump($hash);
print $s;
而不是正常的内容,而是打印了 -: Å
。根据此链接,不过,它应该工作正常。
是的,当我将 YAML::XS::Load
返回时,我再次获得了正确的字符串,但我不喜欢转储的字符串似乎采用了错误的编码。
我做错了什么吗?坦率地说,我总是不确定 perl 中的 unicode...
澄清:我的控制台支持 UTF-8。另外,当我将其打印到文件时,使用 open $file, ">:utf8"
而不是 STDOUT
打开 utf8 句柄,它仍然无法打印正确的 utf -8 个字母。
I am trying to use perl's YAML::XS
module on unicode letters and it doesn't seem working the way it should.
I write this in the script (which is saved in utf-8)
use utf8;
binmode STDOUT, ":utf8";
my $hash = {č => "ř"}; #czech letters with unicode codes U+010D and U+0159
use YAML::XS;
my $s = YAML::XS::Dump($hash);
print $s;
Instead of something sane, -: Å
is printed. According to this link, though, it should be working fine.
Yes, when I YAML::XS::Load
it back, I got the correct strings again, but I don't like the fact the dumped string seems to be in some wrong encoding.
Am I doing something wrong? I am always unsure about unicode in perl, to be frank...
clarification: my console supports UTF-8. Also, when I print it to file, opened with utf8 handle with open $file, ">:utf8"
instead of STDOUT
, it still doesn't print correct utf-8 letters.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
是的,你做错了什么。您误解了您提到的链接的含义。
转储
&Load
使用原始 UTF-8 字节;即包含 UTF-8 但 UTF-8 标志关闭的字符串。当您使用
:utf8
层将这些字节打印到文件句柄时,它们会被解释为 Latin-1 并转换为 UTF-8,产生双编码输出(只要满足以下条件就可以成功读回)你双重解码它)。您想要改为binmode STDOUT, ':raw'
。另一种选择是对
Dump
utf8::decode >。这会将原始 UTF-8 字节转换为字符串(带有 UTF-8 标志)。然后,您可以将该字符串打印到:utf8
文件句柄。因此,或者
或者
同样,从文件读取时,您希望以
:raw
模式读取或在字符串上使用utf8::encode
,然后再将其传递给加载。
如果可能,您应该只使用
DumpFile
&LoadFile
,让 YAML::XS 正确打开文件。但如果你想使用 STDIN/STDOUT,你就必须处理Dump
& 。加载
。Yes, you're doing something wrong. You've misunderstood what the link you mentioned means.
Dump
&Load
work with raw UTF-8 bytes; i.e. strings containing UTF-8 but with the UTF-8 flag off.When you print those bytes to a filehandle with the
:utf8
layer, they get interpreted as Latin-1 and converted to UTF-8, producing double-encoded output (which can be read back successfully as long as you double-decode it). You want tobinmode STDOUT, ':raw'
instead.Another option is to call utf8::decode on the string returned by
Dump
. This will convert the raw UTF-8 bytes to a character string (with the UTF-8 flag on). You can then print the string to a:utf8
filehandle.So, either
Or
Likewise, when reading from a file, you want to read in
:raw
mode or useutf8::encode
on the string before passing it toLoad
.When possible, you should just use
DumpFile
&LoadFile
, letting YAML::XS deal with opening the file correctly. But if you want to use STDIN/STDOUT, you'll have to deal withDump
&Load
.如果您不使用 binmode STDOUT, ":utf8";,它就可以工作。只是不要问我为什么。
It works if you don't use
binmode STDOUT, ":utf8";
. Just don't ask me why.我将 next 用于 utf-8 JSON 和 YAML。没有错误处理,但可以展示如何做。
下面允许我:
\w
正则表达式和lc
uc
等等(至少满足我的需要)/á/
我的“broilerplate”...
你可以尝试下
in.yaml
I'm using the next for the utf-8 JSON and YAML. No error handling, but can show how to do.
The bellow allows me:
\w
regexes andlc
uc
and so on (at least for my needs)/á/
My "broilerplate"...
You can try the next
in.yaml