如何使用Perl Pack将UTF-16替代配对转换为UTF-8?
我有输入字符串,其中包含文本,其中某些字符以UTF-16格式为单位,并以'\ u
'逃脱。我正在尝试将所有字符串转换为UTF-8。例如,字符串'Alice&鲍勃& Carol'
可能在输入中格式为:
'Alice \ U0026 Bob \ U0026 Carol'
要进行我所需的转换,我正在做...:
$ str = 〜〜 s/\\ u([[a-fa-f0-9] {4})/pack(“ u”,hex($ 1))/eg;
...它工作正常,直到我进入输入字符串其中包含UTF-16代理对,例如:
'Alice \ ud83d \ ude06 bob'
如何修改上述使用pack
与UTF-16替代配对配对的代码?我真的想要一个仅使用pack
的解决方案,而不必使用任何其他库(JSON :: XS,ENCODE等)。
I have input strings which contain text in which some characters are in UTF-16 format and escaped with '\u
'. I am trying to, in Perl, convert all the strings to UTF-8. For example, the string 'Alice & Bob & Carol'
might be formatted in the input as:
'Alice \u0026 Bob \u0026 Carol'
To do my desired conversion, I was doing...:
$str =~ s/\\u([A-Fa-f0-9]{4})/pack("U", hex($1))/eg;
...which worked fine until I got to input strings that contained UTF-16 surrogate pairs like:
'Alice \ud83d\ude06 Bob'
How do I modify the above code that uses pack
to work with UTF-16 surrogate pairs? I would really like a solution that just uses pack
without having to use any additional libraries (JSON::XS, Encode, etc.).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
pack
/解开
不知道UTF-16文本,只有UTF-8(和UTF-EBCDIC)。由于您不想使用模块,因此必须手动解码替代对。输出
pack
/unpack
have no knowledge of UTF-16 text, just UTF-8 (And UTF-EBCDIC). You have to decode the surrogate pairs manually since you don't want to use a module.outputs