使用 Perl 将数字十六进制格式的 UCS2(未知 LE 或 BE)转换为 UTF-8
希望有人能指出我出错的方向:
我有一串(我相信的)是十六进制编码的 UCS2,但提供商无法告诉我它是 UCS2-LE 还是 UCS2-BE 。
像这样: 0627062E062A062806270631
它翻译成这样: ??
显然是阿拉伯语...但无论我是否尝试将其转换为十六进制,将其用作直接 UCS2(LE 或 BE)或几乎任何我能想到的东西,我无法将其转换为 native-perl UTF-8,以便我可以重新编码为标准 UTF-8 (我们系统的本机格式)。
代码:
my $string = "0627062E062A062806270631";
my $decodedHex = hex($string);
#NEAREST
my $perlDecodedUTF8 = decode("UCS-2BE", $decodedHex);
my $utf8 = encode('UTF-8',$perlDecodedUTF8);
open(ARABICTEST,">ucs2test.txt");
print(ARABICTEST $perlDecodedUTF8);
print("Done!");
close(ARABICTEST);
目前输出乱码。
现在我想到的一个想法是将有问题的字符串分成 4 个字符的部分(即每个十六进制代码),但即使尝试使用单个已知的 UCS2 十六进制值似乎也不起作用。
还尝试强制输出编码,也没有什么乐趣。
谢谢!
Hoping someone can point me in the direction of where i'm going wrong with this:
I have a string of (what I believe) is hex-encoded UCS2, but the provider cannot tell me if it is UCS2-LE or UCS2-BE.
Like so: 0627062E062A062806270631
It translates to this: اختبا
In Arabic apparently... but no-matter whether I try converting it out of hex, using it as straight UCS2 (LE or BE) or practically anything else I can think of under the sun, I can't turn it into native-perl UTF-8 so that I can then re-encode as standard UTF-8 (Native format of our system).
Code:
my $string = "0627062E062A062806270631";
my $decodedHex = hex($string);
#NEAREST
my $perlDecodedUTF8 = decode("UCS-2BE", $decodedHex);
my $utf8 = encode('UTF-8',$perlDecodedUTF8);
open(ARABICTEST,">ucs2test.txt");
print(ARABICTEST $perlDecodedUTF8);
print("Done!");
close(ARABICTEST);
It outputs gibberish characters at the moment.
Now one idea I did come up with was to split the string in question into 4-character sections (i.e. per hex code), but even trying this with an individual, known UCS2 hex value doesn't appear to work.
Also tried forcing the output encoding, no joy there either.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
hex
不是将十六进制字符串解码为字节序列。pack
是。 (hex
生成一个整数,而不是一串字节。)除此之外,您已经很接近了。试试这个:注意:您可能想使用 UTF-16BE 而不是 UCS-2BE。它们基本上是相同的,但 UTF-16BE 允许代理对,而 UCS-2BE 不允许。因此,所有 UCS-2BE 文本也是有效的 UTF-16BE,但反之则不然。
hex
is not the way to decode a hex string to a byte sequence.pack
is. (hex
produces a single integer, not a string of bytes.) Other than that, you were close. Try this:Note: You probably want to use UTF-16BE instead of UCS-2BE. They're basically the same thing, but UTF-16BE allows surrogate pairs, and UCS-2BE doesn't. So all UCS-2BE text is also valid UTF-16BE, but not vice versa.