toEscapedUnicode方法生成的Unicode不带空格

发布于 2024-08-24 00:28:39 字数 605 浏览 4 评论 0原文

对于这个单词चौरेउत्तमयादव,Unicode 是==> <代码>\u0938\u0941\u0916\u091A\u0948\u0928\u093E\u0928\u0940\u0930\u0940\u091D\u0941\u092E\u0932\u091C\u093F\u0935\u0924\u0930 \u093E\u092E

并查看它在 \u0930\u091C 之前有空格

但是当我在代码中尝试

String tempString=Strings.toEscapedUnicode(strString); 

这种转换为 Unicode 的方法时给出的结果不带空格: <代码>\u0938\u0941\u0916\u091A\u0948\u0928\u093E\u0928\u0940\u0930\u0940\u091D\u0941\u092E\u0932\u091C\u093F\u0935\u0924\u0930 \u093E\u092E

这就是它们不匹配的原因。 我的“toEscapeUnicode”方法生成不带空格的 Unicode。 我想要空间,那么该怎么做呢?

For this word चौरेउत्तमयादव the Unicode is==>
\u0938\u0941\u0916\u091A\u0948\u0928\u093E\u0928\u0940 \u0930\u0940\u091D\u0941\u092E\u0932 \u091C\u093F\u0935\u0924\u0930\u093E\u092E

and look it has spaces before \u0930 and \u091C

But when I am trying in my code

String tempString=Strings.toEscapedUnicode(strString); 

This method to convert to Unicode gives a result without spaces:
\u0938\u0941\u0916\u091A\u0948\u0928\u093E\u0928\u0940\u0930\u0940\u091D\u0941\u092E\u0932\u091C\u093F\u0935\u0924\u0930\u093E\u092E

and that's why they are not matching.
My 'toEscapeUnicode' method generates Unicode without spaces.
I want the spaces, so how to do it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

守护在此方 2024-08-31 00:28:39

这不是一个完整的答案,但是...当我复制粘贴 Unicode 字符“चौरेउत्तमयादव”,然后使用几个工具来分析其中的内容时,我看不到任何空格:

echo "चौरेउत्तमयादव " | odx

这会生成一个十六进制转储数据;末尾有空白,但中间没有。

0x0000: E0 A4 9A E0 A5 8C E0 A4 B0 E0 A5 87 E0 A4 89 E0   ................
0x0010: A4 A4 E0 A5 8D E0 A4 A4 E0 A4 AE E0 A4 AF E0 A4   ................
0x0020: BE E0 A4 A6 E0 A4 B5 20 0A                        ....... .
0x0029:

第二个命令解码 UTF-8 数据:

echo "चौरेउत्तमयादव " | utf8-unicode

它产生:

0xE0 0xA4 0x9A = U+091A
0xE0 0xA5 0x8C = U+094C
0xE0 0xA4 0xB0 = U+0930
0xE0 0xA5 0x87 = U+0947
0xE0 0xA4 0x89 = U+0909
0xE0 0xA4 0xA4 = U+0924
0xE0 0xA5 0x8D = U+094D
0xE0 0xA4 0xA4 = U+0924
0xE0 0xA4 0xAE = U+092E
0xE0 0xA4 0xAF = U+092F
0xE0 0xA4 0xBE = U+093E
0xE0 0xA4 0xA6 = U+0926
0xE0 0xA4 0xB5 = U+0935
0x20 = U+0020
0x0A = U+000A

因此,看来您的问题可能在于“toEscapedUnicode”的输入而不是其输出。


另外,似乎我从问题中复制粘贴的内容与您在字符串中所说的内容不匹配:

Yours     Mine

\u0938    U+091A
\u0941    U+094C
\u0916    U+0930
\u091A    U+0947
\u0948    U+0909
\u0928    U+0924
\u093E    U+094D
\u0928    U+0924
\u0940    U+092E
\u0020
\u0930    U+092F
\u0940    U+093E
\u091D    U+0926
\u0941    U+0935
\u092E
\u0932
\u0020
\u091C
\u093F
\u0935
\u0924

因此,由于其他原因,粘贴的文本也与声明的翻译不匹配。


我相信您指定的 Unicode 字符串应如下所示:

सुखचैनानीरीझुमलजिवतराम

我使用了一个包含您声明的值的文件,减去 \u 前缀并用 0020 代替空格:

0938
0941
0916
091A
0948
0928
093E
0928
0940
0020
0930
0940
091D
0941
092E
0932
0020
091C
093F
0935
0924
0930
093E
092E

然后我使用我建议使用这个纯自制 Perl 脚本来生成 UTF-8 字符串,作为转义的 Unicode 字符串的等效项。我确信 Perl 中有可用的机制可以做到这一点(使用 Unicode 相关模块),但这对我有用。如果我不把调试代码留在那里,那就不会那么冗长了):

#!/bin/perl -w

use strict;
use constant debug => 0;

while (<>)
{
    chomp;
    my $i = hex;
    printf STDERR "0x%04X = %4d\n", $i, $i if debug;
    if ($i < 0x100)
    {
        # 1-byte UTF-8
        printf STDERR "  0x%02X (%3d)\n", $i, $i if debug;
        printf "%c", $i;
    }
    elsif ($i < 0x800)
    {
        # 2-byte UTF-8
        my($b1) = 0xC0 | (($i >> 6) & 0xFF);
        my($b2) = 0x80 | ($i & 0x3F);
        printf STDERR "  0x%02X (%3d)\n", $b1, $b1 if debug;
        printf STDERR "  0x%02X (%3d)\n", $b2, $b2 if debug;
        printf "%c%c", $b1, $b2;
    }
    elsif ($i < 0x10000)
    {
        # 3-byte UTF-8
        my($b1) = 0xE0 | (($i >> 12) & 0xFF);
        my($b2) = 0x80 | (($i >>  6) & 0x3F);
        my($b3) = 0x80 | ( $i        & 0x3F);
        printf STDERR "  0x%02X (%3d)\n", $b1, $b1 if debug;
        printf STDERR "  0x%02X (%3d)\n", $b2, $b2 if debug;
        printf STDERR "  0x%02X (%3d)\n", $b3, $b3 if debug;
        printf "%c%c%c", $b1, $b2, $b3;
    }
    else
    {
        # 4-byte UTF-8 or error
        die "Oh bother!";
    }
}
print "\n";

您可以填写 4 字节 UTF-8 和错误处理内容。我不会诊断无效的 UTF-8 序列(尤其是 UTF-16 代理),因此如果您放入虚假的 Unicode 数据点,您将从脚本中获得虚假的 UTF-8 值。如果您需要了解更多信息,请阅读 Unicode 书籍的第 3 章(可从 Unicode.org 下载 - 作为一章)或常见问题解答 - UTF-8、UTF-16、UTF-32 和 BOM。

It isn't a whole answer, but...when I copy'n'paste the Unicode characters "चौरेउत्तमयादव " and then use a couple of tools to analyze what's there, I do not see any spaces:

echo "चौरेउत्तमयादव " | odx

This produces a hex dump of the data; there's a blank at the end, but none in the middle.

0x0000: E0 A4 9A E0 A5 8C E0 A4 B0 E0 A5 87 E0 A4 89 E0   ................
0x0010: A4 A4 E0 A5 8D E0 A4 A4 E0 A4 AE E0 A4 AF E0 A4   ................
0x0020: BE E0 A4 A6 E0 A4 B5 20 0A                        ....... .
0x0029:

And the second command decodes UTF-8 data:

echo "चौरेउत्तमयादव " | utf8-unicode

It produces:

0xE0 0xA4 0x9A = U+091A
0xE0 0xA5 0x8C = U+094C
0xE0 0xA4 0xB0 = U+0930
0xE0 0xA5 0x87 = U+0947
0xE0 0xA4 0x89 = U+0909
0xE0 0xA4 0xA4 = U+0924
0xE0 0xA5 0x8D = U+094D
0xE0 0xA4 0xA4 = U+0924
0xE0 0xA4 0xAE = U+092E
0xE0 0xA4 0xAF = U+092F
0xE0 0xA4 0xBE = U+093E
0xE0 0xA4 0xA6 = U+0926
0xE0 0xA4 0xB5 = U+0935
0x20 = U+0020
0x0A = U+000A

So, it seems that your problem might be with the input to 'toEscapedUnicode' rather than with its output.


Also, it seems that what I copy'n'paste from the question doesn't match what you say is in the string:

Yours     Mine

\u0938    U+091A
\u0941    U+094C
\u0916    U+0930
\u091A    U+0947
\u0948    U+0909
\u0928    U+0924
\u093E    U+094D
\u0928    U+0924
\u0940    U+092E
\u0020
\u0930    U+092F
\u0940    U+093E
\u091D    U+0926
\u0941    U+0935
\u092E
\u0932
\u0020
\u091C
\u093F
\u0935
\u0924

So, the pasted text does not match the claimed translation for other reasons too.


I believe that the Unicode string you specify should look like:

सुखचैनानी रीझुमल जिवतराम

I used a file containing the values you claimed, minus the \u prefixes and with 0020 in place of the blanks:

0938
0941
0916
091A
0948
0928
093E
0928
0940
0020
0930
0940
091D
0941
092E
0932
0020
091C
093F
0935
0924
0930
093E
092E

And then I used this pure home-brew Perl script to generate the UTF-8 string I propose as the equivalent of your escaped Unicode string. I'm sure there are mechanisms available in Perl to do it otherwise (using Unicode-related modules), but this worked for me. It would be less verbose if I didn't leave the debug code in there):

#!/bin/perl -w

use strict;
use constant debug => 0;

while (<>)
{
    chomp;
    my $i = hex;
    printf STDERR "0x%04X = %4d\n", $i, $i if debug;
    if ($i < 0x100)
    {
        # 1-byte UTF-8
        printf STDERR "  0x%02X (%3d)\n", $i, $i if debug;
        printf "%c", $i;
    }
    elsif ($i < 0x800)
    {
        # 2-byte UTF-8
        my($b1) = 0xC0 | (($i >> 6) & 0xFF);
        my($b2) = 0x80 | ($i & 0x3F);
        printf STDERR "  0x%02X (%3d)\n", $b1, $b1 if debug;
        printf STDERR "  0x%02X (%3d)\n", $b2, $b2 if debug;
        printf "%c%c", $b1, $b2;
    }
    elsif ($i < 0x10000)
    {
        # 3-byte UTF-8
        my($b1) = 0xE0 | (($i >> 12) & 0xFF);
        my($b2) = 0x80 | (($i >>  6) & 0x3F);
        my($b3) = 0x80 | ( $i        & 0x3F);
        printf STDERR "  0x%02X (%3d)\n", $b1, $b1 if debug;
        printf STDERR "  0x%02X (%3d)\n", $b2, $b2 if debug;
        printf STDERR "  0x%02X (%3d)\n", $b3, $b3 if debug;
        printf "%c%c%c", $b1, $b2, $b3;
    }
    else
    {
        # 4-byte UTF-8 or error
        die "Oh bother!";
    }
}
print "\n";

You can fill in the 4-byte UTF-8 and error handling stuff. I don't diagnose invalid UTF-8 sequences (notably the UTF-16 surrogates), so if you put bogus Unicode data points in, you will get bogus UTF-8 values out of the script. If you need to know more about that, read Chapter 3 of the Unicode book (available for download - as a chapter - from Unicode.org) or the FAQ - UTF-8, UTF-16, UTF-32 and BOM.

浅暮の光 2024-08-31 00:28:39

我遇到过类似的情况,我必须显示这样的数据
“\U0928\U093e\U0936\U092a\U093e\U0924\U0940”必须是 नाशपाती

我搜索了很多来转换它,但我自己发现的答案非常简单&简单的。

只是我必须将来自 JSON 的给定字符串放入 UILabel 或您想要的任何内容中。
就我而言,情况是这样的:

let meaning = array[indexPath.row] as! NSDictionary
cell.textLabel?.text = meaning.value(forKey: "key") as? String

I had similar situation where I had to display data something like this
"\U0928\U093e\U0936\U092a\U093e\U0924\U0940" which has to be नाशपाती

I search out a lot to convert it but the answer that I found out myself was very simple & easy.

Only I had to put the given string coming from JSON into UILabel or anything you want.
In my case, it was something like this:

let meaning = array[indexPath.row] as! NSDictionary
cell.textLabel?.text = meaning.value(forKey: "key") as? String
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文