Java-从 unicode 转换为 ANSI
我有一个字符串 \u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF
。 我需要将其转换为 ANSI 格式的 Avwg wKse
wš—i K_v ejwQ`。如何在 java 中将此 Unicode 字符转换为 ANSI 字符?
编辑:
resultView.setTypeface(typeFace);
String str=new String("\u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF");
resultView.setText(str);
I have a string \u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF
.
I need to convert it in Avwg wKse
wš—i K_v ejwQ` which is in ANSI format. How can I convert this Unicode to ANSI characters in java.
Edit:
resultView.setTypeface(typeFace);
String str=new String("\u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF");
resultView.setText(str);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
那不是 ANSI 格式。 Windows 中的(名称具有误导性)“ANSI”代码页均基于 ASCII,并在高字节中添加了不同的字符。字节 0x41 (
A
) 作为 ANSI 代码页中的前导字母始终表示拉丁语A
,而不是孟加拉语আ
。我认为您拥有的是自定义符号字体,它将任意符号映射到完全不相关的代码点。每一种这样的字体都有自己的视觉编码;要在 Unicode 和自定义视觉编码之间进行转换,您必须通过查看每个字符的字形并将它们与表示同一字母的 Unicode 字符进行匹配来构建自己的转换表。
我强烈建议您使用支持孟加拉语的正确 Unicode 识别字体。陷入任意特定于字体的编码中的内容很难处理(因为从语义上讲,您实际上正在处理一个表示“AvwgwKsewš—i K_v ejwQ”的字符串,以及暗示的所有编辑和大小写更改陷阱。
视觉编码字体在 Windows 拥有良好的 Unicode(甚至 ISCII)支持之前,它们是一个不幸的遗物。今天它们不应该用于任何用途。
That's not ANSI format. The (misleadingly-named) "ANSI" code pages in Windows are all based around ASCII, with different characters added in the high bytes. Byte 0x41 (
A
) as a leading letter in an ANSI code page always means LatinA
and not Bengaliআ
.What I think you have is a custom symbol font, that maps arbitrary symbols to completely unrelated codepoints. Every such font has its own visual encoding; to convert between Unicode and the custom visual encoding you'd have to build up your own translation table by looking at the glyphs for each character and matching them to the Unicode character that represents the same letter.
I would strongly advise getting a proper Unicode-aware font that supports Bengali instead. Content stuck in an arbitrary font-specific encoding is difficult to deal with (because semantically you really are dealing with a string that means "AvwgwKsewš—i K_v ejwQ", with all the editing and case-changing gotchas that implies.
Visual-encoded fonts are an unhappy relic of the time before Windows had good Unicode (or even ISCII) support. They should not be used for anything today.
我不确定您到底在问什么,但我假设您问的是如何将某些字符从 Unicode 转换为 8 位字符集。 (例如,ISO-8859-1 是“西欧”语言的字符集,例如英语)。
我不知道有什么方法可以自动检测相关的 8 位字符集,所以我查找了您的一个字符(在这里 http://unicode.org/charts/ ),我可以看到这些字符是孟加拉语。
我认为孟加拉语的等效 8 位字符集称为
x-iscii-be
。我的系统上没有安装此软件,因此无法成功进行转换。
编辑:Java 不支持字符集
x-iscii-be
,但我将保留此答案的其余部分以供说明。请参阅 http://download.oracle.com/ javase/7/docs/technotes/guides/intl/encoding.doc.html 获取支持的字符集列表。EDIT2:Android 当然不保证支持此字符集(唯一的它保证的8位字符集是ISO-8859-1)。请参阅: http://developer.android.com/reference/java/nio /charset/Charset.html .
*所以,我认为您应该在孟加拉 Android 设备上运行一些字符集检测代码 - 也许它支持此字符集。您需要的一切都在我的代码示例中。 *
为了让 Java 将数据转换为不同的字符集,您在 Java 中所需要做的就是检查是否安装了所需的字符集,然后在将字符串转换为字节时指定所需的字符集。
转换本身非常简单:
因此,您会看到,字符串本身以一种“标准化”形式存储(即defaultCharset),并且您可以将 getBytes(charsetName) 视为一种“替代输出格式”字符串。 抱歉 - 解释很差!
在你的情况下,也许你只需要为 resultView 分配一个字符集,框架就会为你发挥它的魔力......
这是我整理的一些测试代码来说明点,并检查系统是否支持给定的字符集。
我有这段代码将字节数组输出为“十六进制”字符串,以便您可以看到转换后数据不同。
另请参阅此处有关字符集转换的更多信息:http://download.oracle .com/javase/tutorial/i18n/text/string.html
字符集是一件棘手的事情,所以请原谅我复杂的回答。
华泰
I'm not sure exactly what you're asking, but I'll assume you're asking how to convert some characters from Unicode into an 8-bit character set. (e.g. ISO-8859-1 is the characterset for 'Western European' languages, like English).
I don't know of any way to automatically detect the relevant 8-bit charset, so I looked up one of your characters (on here http://unicode.org/charts/ ), and I can see that these characters are Bengali.
I think the equivalent 8-bit character set for Bengali is known as
x-iscii-be
.I don't have this installed on my system, so I couldn't do the conversion successfully.
EDIT: Java does not support the charset
x-iscii-be
, but I'll leave the remainder of this answer for illustration purposes. See http://download.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html for a list of supported Charsets.EDIT2: Android certainly doesn't guarantee support for this charset (the only 8-bit characterset it guarantees is ISO-8859-1). See: http://developer.android.com/reference/java/nio/charset/Charset.html .
*So, I think you should run some Charset-detecting code on a Bengali Android device - perhaps it supports this charset. Everything you need is in my code sample. *
In order for Java to convert your data in a different charset, all you need to do in Java is to check that the desired Charset is installed, and then specify the desired Charset when you convert the String into bytes.
The conversion itself would be extremely simple:
So, you see, the String itself is stored in a kind of 'normalised' form (i.e. the defaultCharset), and you can treat the getBytes(charsetName) as kind of 'alternative output format' for the String. Sorry - poor explanation!
In your situation, perhaps you just need to assign a Charset to the resultView, and the framework will work its magic for you ...
Here's some test code I put together to illustrate the point, and to check whether a given charset is supported on a system.
I have got this code to output the byte-arrays as 'hex' strings, so that you can see that the data is different after conversion.
See also here for more information on charset conversion: http://download.oracle.com/javase/tutorial/i18n/text/string.html
Charactersets are a tricky business, so please forgive my convoluted answer.
HTH
我写了一个类,可以解决UTF-8中的09CB ো,09CC ৌ,09C7 ে,09C8 ৈ,09BF ি্য,্র,ৃ问题,我通过编辑字体字形来重塑它,你不需要将其更改为扩展 ASCII,:( 但我仍然无法解决你的问题孟加拉语共轭体。为了正确渲染,它需要 android 3.5 或更高版本,它可以在 android 4.0(冰淇淋三明治)上顺利运行。
I've written a class which can solve the problem of 09CB ো, 09CC ৌ, 09C7 ে, 09C8 ৈ,09BF ি ্য,্র,ৃ in UTF-8, I reshape it by editing font glyph, you don't need to change it to extended ASCII, :( but still i couldn't solve your bengali conjugates. For proper render it require android 3.5 or higher, it'll work smooth on android 4.0 (Ice Cream Sandwich).