如何将 unicode 编码数据转换为 Devanagri(印地语) 文本

发布于 2024-11-20 00:17:08 字数 153 浏览 5 评论 0原文

我正在将 Devanagri(印地语)脚本中的 SMS 消息从手机接收到桌面程序中,但它以编码(例如 - 091A09470924002009240924)显示数据,我发现这是 unicode。是否有现有的库可以让我将其转换为印地文文本?如果没有,我该如何为此编写一个方法?我正在使用 C#。

I am receiving SMS messages in the Devanagri (Hindi) script from my mobile phone into my desktop program, but it is displaying the data in an encoding (Eg. - 091A09470924002009240924) which I found out is unicode. Is there an existing library that will allow me to convert this to hindi text? If not, how do I go about writing a method for this? I'm using C#.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

诠释孤独 2024-11-27 00:17:08

使用 System.Text.Encoding 类。它有方法 GetChars(byte[])。您可能需要合适的字体,因为某些印地语符号可以通过多种方式书写。

Use System.Text.Encoding class. It has method GetChars(byte[]). And probably you'll need an appropriate font since some Hindi symbols can be written in several ways.

ζ澈沫 2024-11-27 00:17:08

以下是我用于将 格鲁吉亚语 unicode 转换为其拉丁语等效文本的代码片段。

string[] charset = new string[33] { "a", "b", "g", "d", "e", "v", "z", "T", "i", "k", "l", "m", "n", "o", "p", "J", "r", "s","t", "u", "f", "q", "R", "y", "S", "C", "c", "Z", "w", "W", "x", "j", "h" };
string unicodeString = "აბ, - გდ";
string latin_string = "";
byte[] unicodeBytes = Encoding.Unicode.GetBytes(unicodeString);
for (int p = 0; p < unicodeBytes.Length / 2; p++)
{
if (unicodeBytes[p * 2] > 207 && unicodeBytes[p * 2] < 241)
latin_string += charset[unicodeBytes[p * 2] - 208];
else
latin_string += Convert.ToChar(unicodeBytes[p * 2]).ToString();
}

仅解释必要的部分:

Encoding.Unicode.GetBytes(unicodeString); 返回字节数组,该数组的长度为2 * unicodeString.Length。这样 unicodestring 中的每个字母都有一对字节。
为了更好的解释这里附有图像在此处输入图像描述

unicodeBytes甚至索引也有代表字母的值你想要解码。格鲁吉亚字母表的第一个字母从 208 开始,到 240 结束(总共 33 个)。因此,如果 unicodeBytes 值在 [208;240] 范围内,我必须使用 charset 字符串数组来获取等效的拉丁文,否则 unicodeBytes 值只是字符代码。

我不知道是否有一个库,但此方法将为您提供如何编写自己的转换器的基本概念。

Here's code snippet I used for converting Georgian unicode to its Latin equivalent text.

string[] charset = new string[33] { "a", "b", "g", "d", "e", "v", "z", "T", "i", "k", "l", "m", "n", "o", "p", "J", "r", "s","t", "u", "f", "q", "R", "y", "S", "C", "c", "Z", "w", "W", "x", "j", "h" };
string unicodeString = "აბ, - გდ";
string latin_string = "";
byte[] unicodeBytes = Encoding.Unicode.GetBytes(unicodeString);
for (int p = 0; p < unicodeBytes.Length / 2; p++)
{
if (unicodeBytes[p * 2] > 207 && unicodeBytes[p * 2] < 241)
latin_string += charset[unicodeBytes[p * 2] - 208];
else
latin_string += Convert.ToChar(unicodeBytes[p * 2]).ToString();
}

explaining only the necessary part:

Encoding.Unicode.GetBytes(unicodeString); returns array of bytes, length of this array is 2 * unicodeString.Length. so that every letter from unicodestring has a pair of bytes.
for a better explanation heres image attachedenter image description here

unicodeBytes even indexes have values representing the letter you want to decode. first letter of the Georgian alphabet was starting at 208 ending at 240 (33 in total). so if unicodeBytes value was in the range of [208;240] i had to use the charset string array to get the Latin equivalent, otherwise unicodeBytes value was just char code.

I don't know if there is a library for it but this method will give you basic idea how to write your own convertor.

抚笙 2024-11-27 00:17:08

感谢您的回复,他们帮助我找到了确切的解决方案 - http://social.msdn.microsoft.com/Forums/en/netfxbcl/thread/12a3558d-fe48-44fd-840e-03facfd9c944

Thanks for the responses, they helped me find the exact solution - http://social.msdn.microsoft.com/Forums/en/netfxbcl/thread/12a3558d-fe48-44fd-840e-03facfd9c944

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文