toEscapedUnicode方法生成的Unicode不带空格

发布于 2024-08-24 00:28:39 字数 605 浏览 4 评论 0原文

对于这个单词चौरेउत्तमयादव，Unicode 是==> <代码>\u0938\u0941\u0916\u091A\u0948\u0928\u093E\u0928\u0940\u0930\u0940\u091D\u0941\u092E\u0932\u091C\u093F\u0935\u0924\u0930 \u093E\u092E

并查看它在 \u0930 和 \u091C 之前有空格

但是当我在代码中尝试

String tempString=Strings.toEscapedUnicode(strString);

这种转换为 Unicode 的方法时给出的结果不带空格： <代码>\u0938\u0941\u0916\u091A\u0948\u0928\u093E\u0928\u0940\u0930\u0940\u091D\u0941\u092E\u0932\u091C\u093F\u0935\u0924\u0930 \u093E\u092E

这就是它们不匹配的原因。我的“toEscapeUnicode”方法生成不带空格的 Unicode。我想要空间，那么该怎么做呢？

原文

For this word चौरेउत्तमयादव the Unicode is==>
\u0938\u0941\u0916\u091A\u0948\u0928\u093E\u0928\u0940 \u0930\u0940\u091D\u0941\u092E\u0932 \u091C\u093F\u0935\u0924\u0930\u093E\u092E

and look it has spaces before \u0930 and \u091C

But when I am trying in my code

String tempString=Strings.toEscapedUnicode(strString);

This method to convert to Unicode gives a result without spaces:
\u0938\u0941\u0916\u091A\u0948\u0928\u093E\u0928\u0940\u0930\u0940\u091D\u0941\u092E\u0932\u091C\u093F\u0935\u0924\u0930\u093E\u092E

and that's why they are not matching.
My 'toEscapeUnicode' method generates Unicode without spaces.
I want the spaces, so how to do it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

守护在此方 2024-08-31 00:28:39

这不是一个完整的答案，但是...当我复制粘贴 Unicode 字符“चौरेउत्तमयादव”，然后使用几个工具来分析其中的内容时，我看不到任何空格：

echo "चौरेउत्तमयादव " | odx

这会生成一个十六进制转储数据；末尾有空白，但中间没有。

0x0000: E0 A4 9A E0 A5 8C E0 A4 B0 E0 A5 87 E0 A4 89 E0   ................
0x0010: A4 A4 E0 A5 8D E0 A4 A4 E0 A4 AE E0 A4 AF E0 A4   ................
0x0020: BE E0 A4 A6 E0 A4 B5 20 0A                        ....... .
0x0029:

第二个命令解码 UTF-8 数据：

echo "चौरेउत्तमयादव " | utf8-unicode

它产生：

0xE0 0xA4 0x9A = U+091A
0xE0 0xA5 0x8C = U+094C
0xE0 0xA4 0xB0 = U+0930
0xE0 0xA5 0x87 = U+0947
0xE0 0xA4 0x89 = U+0909
0xE0 0xA4 0xA4 = U+0924
0xE0 0xA5 0x8D = U+094D
0xE0 0xA4 0xA4 = U+0924
0xE0 0xA4 0xAE = U+092E
0xE0 0xA4 0xAF = U+092F
0xE0 0xA4 0xBE = U+093E
0xE0 0xA4 0xA6 = U+0926
0xE0 0xA4 0xB5 = U+0935
0x20 = U+0020
0x0A = U+000A

因此，看来您的问题可能在于“toEscapedUnicode”的输入而不是其输出。

另外，似乎我从问题中复制粘贴的内容与您在字符串中所说的内容不匹配：

Yours     Mine

\u0938    U+091A
\u0941    U+094C
\u0916    U+0930
\u091A    U+0947
\u0948    U+0909
\u0928    U+0924
\u093E    U+094D
\u0928    U+0924
\u0940    U+092E
\u0020
\u0930    U+092F
\u0940    U+093E
\u091D    U+0926
\u0941    U+0935
\u092E
\u0932
\u0020
\u091C
\u093F
\u0935
\u0924

因此，由于其他原因，粘贴的文本也与声明的翻译不匹配。

我相信您指定的 Unicode 字符串应如下所示：

सुखचैनानीरीझुमलजिवतराम

我使用了一个包含您声明的值的文件，减去 \u 前缀并用 0020 代替空格：

然后我使用我建议使用这个纯自制 Perl 脚本来生成 UTF-8 字符串，作为转义的 Unicode 字符串的等效项。我确信 Perl 中有可用的机制可以做到这一点（使用 Unicode 相关模块），但这对我有用。如果我不把调试代码留在那里，那就不会那么冗长了）：

#!/bin/perl -w

use strict;
use constant debug => 0;

while (<>)
{
    chomp;
    my $i = hex;
    printf STDERR "0x%04X = %4d\n", $i, $i if debug;
    if ($i < 0x100)
    {
        # 1-byte UTF-8
        printf STDERR "  0x%02X (%3d)\n", $i, $i if debug;
        printf "%c", $i;
    }
    elsif ($i < 0x800)
    {
        # 2-byte UTF-8
        my($b1) = 0xC0 | (($i >> 6) & 0xFF);
        my($b2) = 0x80 | ($i & 0x3F);
        printf STDERR "  0x%02X (%3d)\n", $b1, $b1 if debug;
        printf STDERR "  0x%02X (%3d)\n", $b2, $b2 if debug;
        printf "%c%c", $b1, $b2;
    }
    elsif ($i < 0x10000)
    {
        # 3-byte UTF-8
        my($b1) = 0xE0 | (($i >> 12) & 0xFF);
        my($b2) = 0x80 | (($i >>  6) & 0x3F);
        my($b3) = 0x80 | ( $i        & 0x3F);
        printf STDERR "  0x%02X (%3d)\n", $b1, $b1 if debug;
        printf STDERR "  0x%02X (%3d)\n", $b2, $b2 if debug;
        printf STDERR "  0x%02X (%3d)\n", $b3, $b3 if debug;
        printf "%c%c%c", $b1, $b2, $b3;
    }
    else
    {
        # 4-byte UTF-8 or error
        die "Oh bother!";
    }
}
print "\n";

您可以填写 4 字节 UTF-8 和错误处理内容。我不会诊断无效的 UTF-8 序列（尤其是 UTF-16 代理），因此如果您放入虚假的 Unicode 数据点，您将从脚本中获得虚假的 UTF-8 值。如果您需要了解更多信息，请阅读 Unicode 书籍的第 3 章（可从 Unicode.org 下载 - 作为一章）或常见问题解答 - UTF-8、UTF-16、UTF-32 和 BOM。

It isn't a whole answer, but...when I copy'n'paste the Unicode characters "चौरेउत्तमयादव " and then use a couple of tools to analyze what's there, I do not see any spaces:

echo "चौरेउत्तमयादव " | odx

This produces a hex dump of the data; there's a blank at the end, but none in the middle.

0x0000: E0 A4 9A E0 A5 8C E0 A4 B0 E0 A5 87 E0 A4 89 E0   ................
0x0010: A4 A4 E0 A5 8D E0 A4 A4 E0 A4 AE E0 A4 AF E0 A4   ................
0x0020: BE E0 A4 A6 E0 A4 B5 20 0A                        ....... .
0x0029:

And the second command decodes UTF-8 data:

echo "चौरेउत्तमयादव " | utf8-unicode

It produces:

0xE0 0xA4 0x9A = U+091A
0xE0 0xA5 0x8C = U+094C
0xE0 0xA4 0xB0 = U+0930
0xE0 0xA5 0x87 = U+0947
0xE0 0xA4 0x89 = U+0909
0xE0 0xA4 0xA4 = U+0924
0xE0 0xA5 0x8D = U+094D
0xE0 0xA4 0xA4 = U+0924
0xE0 0xA4 0xAE = U+092E
0xE0 0xA4 0xAF = U+092F
0xE0 0xA4 0xBE = U+093E
0xE0 0xA4 0xA6 = U+0926
0xE0 0xA4 0xB5 = U+0935
0x20 = U+0020
0x0A = U+000A

So, it seems that your problem might be with the input to 'toEscapedUnicode' rather than with its output.

Also, it seems that what I copy'n'paste from the question doesn't match what you say is in the string:

Yours     Mine

\u0938    U+091A
\u0941    U+094C
\u0916    U+0930
\u091A    U+0947
\u0948    U+0909
\u0928    U+0924
\u093E    U+094D
\u0928    U+0924
\u0940    U+092E
\u0020
\u0930    U+092F
\u0940    U+093E
\u091D    U+0926
\u0941    U+0935
\u092E
\u0932
\u0020
\u091C
\u093F
\u0935
\u0924

So, the pasted text does not match the claimed translation for other reasons too.

I believe that the Unicode string you specify should look like:

सुखचैनानी रीझुमल जिवतराम

I used a file containing the values you claimed, minus the \u prefixes and with 0020 in place of the blanks:

And then I used this pure home-brew Perl script to generate the UTF-8 string I propose as the equivalent of your escaped Unicode string. I'm sure there are mechanisms available in Perl to do it otherwise (using Unicode-related modules), but this worked for me. It would be less verbose if I didn't leave the debug code in there):

#!/bin/perl -w

use strict;
use constant debug => 0;

while (<>)
{
    chomp;
    my $i = hex;
    printf STDERR "0x%04X = %4d\n", $i, $i if debug;
    if ($i < 0x100)
    {
        # 1-byte UTF-8
        printf STDERR "  0x%02X (%3d)\n", $i, $i if debug;
        printf "%c", $i;
    }
    elsif ($i < 0x800)
    {
        # 2-byte UTF-8
        my($b1) = 0xC0 | (($i >> 6) & 0xFF);
        my($b2) = 0x80 | ($i & 0x3F);
        printf STDERR "  0x%02X (%3d)\n", $b1, $b1 if debug;
        printf STDERR "  0x%02X (%3d)\n", $b2, $b2 if debug;
        printf "%c%c", $b1, $b2;
    }
    elsif ($i < 0x10000)
    {
        # 3-byte UTF-8
        my($b1) = 0xE0 | (($i >> 12) & 0xFF);
        my($b2) = 0x80 | (($i >>  6) & 0x3F);
        my($b3) = 0x80 | ( $i        & 0x3F);
        printf STDERR "  0x%02X (%3d)\n", $b1, $b1 if debug;
        printf STDERR "  0x%02X (%3d)\n", $b2, $b2 if debug;
        printf STDERR "  0x%02X (%3d)\n", $b3, $b3 if debug;
        printf "%c%c%c", $b1, $b2, $b3;
    }
    else
    {
        # 4-byte UTF-8 or error
        die "Oh bother!";
    }
}
print "\n";

You can fill in the 4-byte UTF-8 and error handling stuff. I don't diagnose invalid UTF-8 sequences (notably the UTF-16 surrogates), so if you put bogus Unicode data points in, you will get bogus UTF-8 values out of the script. If you need to know more about that, read Chapter 3 of the Unicode book (available for download - as a chapter - from Unicode.org) or the FAQ - UTF-8, UTF-16, UTF-32 and BOM.

回复收藏 0 原文

浅暮の光 2024-08-31 00:28:39

我遇到过类似的情况，我必须显示这样的数据
“\U0928\U093e\U0936\U092a\U093e\U0924\U0940”必须是 नाशपाती

我搜索了很多来转换它，但我自己发现的答案非常简单&简单的。

只是我必须将来自 JSON 的给定字符串放入 UILabel 或您想要的任何内容中。
就我而言，情况是这样的：

let meaning = array[indexPath.row] as! NSDictionary
cell.textLabel?.text = meaning.value(forKey: "key") as? String

I had similar situation where I had to display data something like this
"\U0928\U093e\U0936\U092a\U093e\U0924\U0940" which has to be नाशपाती

I search out a lot to convert it but the answer that I found out myself was very simple & easy.

Only I had to put the given string coming from JSON into UILabel or anything you want.
In my case, it was something like this:

let meaning = array[indexPath.row] as! NSDictionary
cell.textLabel?.text = meaning.value(forKey: "key") as? String

回复收藏 0 原文

~没有更多了~

关于作者

和影子一齐双人舞

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

toEscapedUnicode方法生成的Unicode不带空格

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

toEscapedUnicode方法生成的Unicode不带空格

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。