Java-从 unicode 转换为 ANSI

发布于 2024-12-12 20:04:12 字数 506 浏览 1 评论 0原文

我有一个字符串 \u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF。我需要将其转换为 ANSI 格式的 Avwg wKsewš—i K_v ejwQ`。如何在 java 中将此 Unicode 字符转换为 ANSI 字符？

编辑：

resultView.setTypeface(typeFace);
String str=new String("\u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF");               
resultView.setText(str);

原文

I have a string \u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF.
I need to convert it in Avwg wKsewš—i K_v ejwQ` which is in ANSI format. How can I convert this Unicode to ANSI characters in java.

Edit:

resultView.setTypeface(typeFace);
String str=new String("\u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF");               
resultView.setText(str);

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你的心境我的脸 2024-12-19 20:04:12

我需要将其转换为 ANSI 格式的 AvwgwKsewš—i K_v ejwQ。

那不是 ANSI 格式。 Windows 中的（名称具有误导性）“ANSI”代码页均基于 ASCII，并在高字节中添加了不同的字符。字节 0x41 (A) 作为 ANSI 代码页中的前导字母始终表示拉丁语 A，而不是孟加拉语 আ。

我认为您拥有的是自定义符号字体，它将任意符号映射到完全不相关的代码点。每一种这样的字体都有自己的视觉编码；要在 Unicode 和自定义视觉编码之间进行转换，您必须通过查看每个字符的字形并将它们与表示同一字母的 Unicode 字符进行匹配来构建自己的转换表。

我强烈建议您使用支持孟加拉语的正确 Unicode 识别字体。陷入任意特定于字体的编码中的内容很难处理（因为从语义上讲，您实际上正在处理一个表示“AvwgwKsewš—i K_v ejwQ”的字符串，以及暗示的所有编辑和大小写更改陷阱。

视觉编码字体在 Windows 拥有良好的 Unicode（甚至 ISCII）支持之前，它们是一个不幸的遗物。今天它们不应该用于任何用途。

回复收藏 0 原文

ま柒月 2024-12-19 20:04:12

我不确定您到底在问什么，但我假设您问的是如何将某些字符从 Unicode 转换为 8 位字符集。（例如，ISO-8859-1 是“西欧”语言的字符集，例如英语）。

我不知道有什么方法可以自动检测相关的 8 位字符集，所以我查找了您的一个字符（在这里 http://unicode.org/charts/ ），我可以看到这些字符是孟加拉语。

我认为孟加拉语的等效 8 位字符集称为 x-iscii-be。
我的系统上没有安装此软件，因此无法成功进行转换。

编辑：Java 不支持字符集 x-iscii-be，但我将保留此答案的其余部分以供说明。请参阅 http://download.oracle.com/ javase/7/docs/technotes/guides/intl/encoding.doc.html 获取支持的字符集列表。

EDIT2：Android 当然不保证支持此字符集（唯一的它保证的8位字符集是ISO-8859-1）。请参阅： http://developer.android.com/reference/java/nio /charset/Charset.html .

*所以，我认为您应该在孟加拉 Android 设备上运行一些字符集检测代码 - 也许它支持此字符集。您需要的一切都在我的代码示例中。 *

为了让 Java 将数据转换为不同的字符集，您在 Java 中所需要做的就是检查是否安装了所需的字符集，然后在将字符串转换为字节时指定所需的字符集。

转换本身非常简单：

    str.getBytes("x-iscii-be");

因此，您会看到，字符串本身以一种“标准化”形式存储（即defaultCharset），并且您可以将 getBytes(charsetName) 视为一种“替代输出格式”字符串。 抱歉 - 解释很差！

在你的情况下，也许你只需要为 resultView 分配一个字符集，框架就会为你发挥它的魔力......

这是我整理的一些测试代码来说明点，并检查系统是否支持给定的字符集。

我有这段代码将字节数组输出为“十六进制”字符串，以便您可以看到转换后数据不同。

import java.io.UnsupportedEncodingException;
import java.math.BigInteger;
import java.nio.charset.Charset;
import java.util.Map.Entry;
import java.util.SortedMap;

public class UnicodeTest {
    public static void main(String[] args) throws UnsupportedEncodingException {
        testWestern();
        testBengali();
    }

    public static void testWestern() throws UnsupportedEncodingException {
        String unicodeStr= "\u00c2"; //This is a capital A with an accent.;
        String charsetName= "ISO-8859-1";
        System.out.println("Input (outputted as default charset - normally unicode): "+unicodeStr);
        attempt8bitCharsetConversion(unicodeStr, charsetName);
    }

    public static void testBengali() throws UnsupportedEncodingException {
        String unicodeStr = "\u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF";
        String charsetName= "x-iscii-be";
        System.out.println(unicodeStr);
        attempt8bitCharsetConversion(unicodeStr, charsetName);
    }

    public static void attempt8bitCharsetConversion(String input, String charsetName) throws UnsupportedEncodingException {
        SortedMap<String, Charset> availableCharsets = Charset
                .availableCharsets();
        for (Entry<String, Charset> entry : availableCharsets.entrySet()) {
            if (charsetName.equalsIgnoreCase(entry.getKey())) {
                System.out.println("HEXED input : "+ toHex(input.getBytes(Charset.defaultCharset().name())));
                System.out.println("HEXED output: "+ toHex(input.getBytes(entry.getKey())));
            }
        }
        throw new UnsupportedEncodingException(charsetName+ " is not supported on this system");
    }

    public static String toHex(byte[] input) throws UnsupportedEncodingException {
        return String.format("%x", new BigInteger(input));
    }
}

另请参阅此处有关字符集转换的更多信息：http://download.oracle .com/javase/tutorial/i18n/text/string.html

字符集是一件棘手的事情，所以请原谅我复杂的回答。

华泰

I'm not sure exactly what you're asking, but I'll assume you're asking how to convert some characters from Unicode into an 8-bit character set. (e.g. ISO-8859-1 is the characterset for 'Western European' languages, like English).

I don't know of any way to automatically detect the relevant 8-bit charset, so I looked up one of your characters (on here http://unicode.org/charts/ ), and I can see that these characters are Bengali.

I think the equivalent 8-bit character set for Bengali is known as x-iscii-be.
I don't have this installed on my system, so I couldn't do the conversion successfully.

EDIT: Java does not support the charset x-iscii-be, but I'll leave the remainder of this answer for illustration purposes. See http://download.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html for a list of supported Charsets.

EDIT2: Android certainly doesn't guarantee support for this charset (the only 8-bit characterset it guarantees is ISO-8859-1). See: http://developer.android.com/reference/java/nio/charset/Charset.html .

*So, I think you should run some Charset-detecting code on a Bengali Android device - perhaps it supports this charset. Everything you need is in my code sample. *

In order for Java to convert your data in a different charset, all you need to do in Java is to check that the desired Charset is installed, and then specify the desired Charset when you convert the String into bytes.

The conversion itself would be extremely simple:

    str.getBytes("x-iscii-be");

So, you see, the String itself is stored in a kind of 'normalised' form (i.e. the defaultCharset), and you can treat the getBytes(charsetName) as kind of 'alternative output format' for the String. Sorry - poor explanation!

In your situation, perhaps you just need to assign a Charset to the resultView, and the framework will work its magic for you ...

Here's some test code I put together to illustrate the point, and to check whether a given charset is supported on a system.

I have got this code to output the byte-arrays as 'hex' strings, so that you can see that the data is different after conversion.

import java.io.UnsupportedEncodingException;
import java.math.BigInteger;
import java.nio.charset.Charset;
import java.util.Map.Entry;
import java.util.SortedMap;

public class UnicodeTest {
    public static void main(String[] args) throws UnsupportedEncodingException {
        testWestern();
        testBengali();
    }

    public static void testWestern() throws UnsupportedEncodingException {
        String unicodeStr= "\u00c2"; //This is a capital A with an accent.;
        String charsetName= "ISO-8859-1";
        System.out.println("Input (outputted as default charset - normally unicode): "+unicodeStr);
        attempt8bitCharsetConversion(unicodeStr, charsetName);
    }

    public static void testBengali() throws UnsupportedEncodingException {
        String unicodeStr = "\u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF";
        String charsetName= "x-iscii-be";
        System.out.println(unicodeStr);
        attempt8bitCharsetConversion(unicodeStr, charsetName);
    }

    public static void attempt8bitCharsetConversion(String input, String charsetName) throws UnsupportedEncodingException {
        SortedMap<String, Charset> availableCharsets = Charset
                .availableCharsets();
        for (Entry<String, Charset> entry : availableCharsets.entrySet()) {
            if (charsetName.equalsIgnoreCase(entry.getKey())) {
                System.out.println("HEXED input : "+ toHex(input.getBytes(Charset.defaultCharset().name())));
                System.out.println("HEXED output: "+ toHex(input.getBytes(entry.getKey())));
            }
        }
        throw new UnsupportedEncodingException(charsetName+ " is not supported on this system");
    }

    public static String toHex(byte[] input) throws UnsupportedEncodingException {
        return String.format("%x", new BigInteger(input));
    }
}

See also here for more information on charset conversion: http://download.oracle.com/javase/tutorial/i18n/text/string.html

Charactersets are a tricky business, so please forgive my convoluted answer.

HTH

回复收藏 0 原文

美胚控场 2024-12-19 20:04:12

我写了一个类，可以解决UTF-8中的09CB ো，09CC ৌ，09C7 ে，09C8 ৈ，09BF ি্য，্র，ৃ问题，我通过编辑字体字形来重塑它，你不需要将其更改为扩展 ASCII，:( 但我仍然无法解决你的问题孟加拉语共轭体。为了正确渲染，它需要 android 3.5 或更高版本，它可以在 android 4.0（冰淇淋三明治）上顺利运行。

回复收藏 0 原文

~没有更多了~