如何将 unicode 编码数据转换为 Devanagri(印地语) 文本

发布于 2024-11-20 00:17:08 字数 153 浏览 9 评论 0原文

我正在将 Devanagri（印地语）脚本中的 SMS 消息从手机接收到桌面程序中，但它以编码（例如 - 091A09470924002009240924）显示数据，我发现这是 unicode。是否有现有的库可以让我将其转换为印地文文本？如果没有，我该如何为此编写一个方法？我正在使用 C#。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

诠释孤独 2024-11-27 00:17:08

使用 System.Text.Encoding 类。它有方法 GetChars(byte[])。您可能需要合适的字体，因为某些印地语符号可以通过多种方式书写。

回复收藏 0 原文

ζ澈沫 2024-11-27 00:17:08

以下是我用于将格鲁吉亚语 unicode 转换为其拉丁语等效文本的代码片段。

string[] charset = new string[33] { "a", "b", "g", "d", "e", "v", "z", "T", "i", "k", "l", "m", "n", "o", "p", "J", "r", "s","t", "u", "f", "q", "R", "y", "S", "C", "c", "Z", "w", "W", "x", "j", "h" };
string unicodeString = "აბ, - გდ";
string latin_string = "";
byte[] unicodeBytes = Encoding.Unicode.GetBytes(unicodeString);
for (int p = 0; p < unicodeBytes.Length / 2; p++)
{
if (unicodeBytes[p * 2] > 207 && unicodeBytes[p * 2] < 241)
latin_string += charset[unicodeBytes[p * 2] - 208];
else
latin_string += Convert.ToChar(unicodeBytes[p * 2]).ToString();
}

仅解释必要的部分：

Encoding.Unicode.GetBytes(unicodeString); 返回字节数组，该数组的长度为2 * unicodeString.Length。这样 unicodestring 中的每个字母都有一对字节。
为了更好的解释这里附有图像在此处输入图像描述

unicodeBytes甚至索引也有代表字母的值你想要解码。格鲁吉亚字母表的第一个字母从 208 开始，到 240 结束（总共 33 个）。因此，如果 unicodeBytes 值在 [208;240] 范围内，我必须使用 charset 字符串数组来获取等效的拉丁文，否则 unicodeBytes 值只是字符代码。

我不知道是否有一个库，但此方法将为您提供如何编写自己的转换器的基本概念。

Here's code snippet I used for converting Georgian unicode to its Latin equivalent text.

string[] charset = new string[33] { "a", "b", "g", "d", "e", "v", "z", "T", "i", "k", "l", "m", "n", "o", "p", "J", "r", "s","t", "u", "f", "q", "R", "y", "S", "C", "c", "Z", "w", "W", "x", "j", "h" };
string unicodeString = "აბ, - გდ";
string latin_string = "";
byte[] unicodeBytes = Encoding.Unicode.GetBytes(unicodeString);
for (int p = 0; p < unicodeBytes.Length / 2; p++)
{
if (unicodeBytes[p * 2] > 207 && unicodeBytes[p * 2] < 241)
latin_string += charset[unicodeBytes[p * 2] - 208];
else
latin_string += Convert.ToChar(unicodeBytes[p * 2]).ToString();
}

explaining only the necessary part:

Encoding.Unicode.GetBytes(unicodeString); returns array of bytes, length of this array is 2 * unicodeString.Length. so that every letter from unicodestring has a pair of bytes.
for a better explanation heres image attached enter image description here

unicodeBytes even indexes have values representing the letter you want to decode. first letter of the Georgian alphabet was starting at 208 ending at 240 (33 in total). so if unicodeBytes value was in the range of [208;240] i had to use the charset string array to get the Latin equivalent, otherwise unicodeBytes value was just char code.

I don't know if there is a library for it but this method will give you basic idea how to write your own convertor.

回复收藏 0 原文