使用 Perl 将数字十六进制格式的 UCS2（未知 LE 或 BE）转换为 UTF-8

发布于 2024-11-18 11:45:32 字数 736 浏览 10 评论 0原文

希望有人能指出我出错的方向：

我有一串（我相信的）是十六进制编码的 UCS2，但提供商无法告诉我它是 UCS2-LE 还是 UCS2-BE 。

像这样： 0627062E062A062806270631

它翻译成这样： ??

显然是阿拉伯语...但无论我是否尝试将其转换为十六进制，将其用作直接 UCS2（LE 或 BE）或几乎任何我能想到的东西，我无法将其转换为 native-perl UTF-8，以便我可以重新编码为标准 UTF-8 （我们系统的本机格式）。

代码：

my $string = "0627062E062A062806270631";
my $decodedHex = hex($string);

#NEAREST
my $perlDecodedUTF8 = decode("UCS-2BE", $decodedHex);
my $utf8 = encode('UTF-8',$perlDecodedUTF8);

open(ARABICTEST,">ucs2test.txt");
print(ARABICTEST $perlDecodedUTF8);
print("Done!");
close(ARABICTEST);

目前输出乱码。

现在我想到的一个想法是将有问题的字符串分成 4 个字符的部分（即每个十六进制代码），但即使尝试使用单个已知的 UCS2 十六进制值似乎也不起作用。

还尝试强制输出编码，也没有什么乐趣。

谢谢！

原文

Hoping someone can point me in the direction of where i'm going wrong with this:

I have a string of (what I believe) is hex-encoded UCS2, but the provider cannot tell me if it is UCS2-LE or UCS2-BE.

Like so: 0627062E062A062806270631

It translates to this: اختبا

In Arabic apparently... but no-matter whether I try converting it out of hex, using it as straight UCS2 (LE or BE) or practically anything else I can think of under the sun, I can't turn it into native-perl UTF-8 so that I can then re-encode as standard UTF-8 (Native format of our system).

Code:

my $string = "0627062E062A062806270631";
my $decodedHex = hex($string);

#NEAREST
my $perlDecodedUTF8 = decode("UCS-2BE", $decodedHex);
my $utf8 = encode('UTF-8',$perlDecodedUTF8);

open(ARABICTEST,">ucs2test.txt");
print(ARABICTEST $perlDecodedUTF8);
print("Done!");
close(ARABICTEST);

It outputs gibberish characters at the moment.

Now one idea I did come up with was to split the string in question into 4-character sections (i.e. per hex code), but even trying this with an individual, known UCS2 hex value doesn't appear to work.

Also tried forcing the output encoding, no joy there either.

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

标点 2024-11-25 11:45:32

hex 不是将十六进制字符串解码为字节序列。 pack 是。（hex 生成一个整数，而不是一串字节。）除此之外，您已经很接近了。试试这个：

use strict;
use warnings;
use Encode;

my $string = "0627062E062A062806270631";
my $decodedHex = pack('H*', $string);

my $perlDecodedUTF8 = decode("UCS-2BE", $decodedHex);

open(my $ARABICTEST,">:utf8", "ucs2test.txt");
print $ARABICTEST $perlDecodedUTF8;
print("Done!");
close($ARABICTEST);

注意：您可能想使用 UTF-16BE 而不是 UCS-2BE。它们基本上是相同的，但 UTF-16BE 允许代理对，而 UCS-2BE 不允许。因此，所有 UCS-2BE 文本也是有效的 UTF-16BE，但反之则不然。

hex is not the way to decode a hex string to a byte sequence. pack is. (hex produces a single integer, not a string of bytes.) Other than that, you were close. Try this:

use strict;
use warnings;
use Encode;

my $string = "0627062E062A062806270631";
my $decodedHex = pack('H*', $string);

my $perlDecodedUTF8 = decode("UCS-2BE", $decodedHex);

open(my $ARABICTEST,">:utf8", "ucs2test.txt");
print $ARABICTEST $perlDecodedUTF8;
print("Done!");
close($ARABICTEST);

Note: You probably want to use UTF-16BE instead of UCS-2BE. They're basically the same thing, but UTF-16BE allows surrogate pairs, and UCS-2BE doesn't. So all UCS-2BE text is also valid UTF-16BE, but not vice versa.

回复收藏 0 原文

~没有更多了~