Java 日语字符编码

发布于 2024-10-20 08:01:08 字数 360 浏览 6 评论 0原文

我有一个带有日语字符的文件名。文件名：S－最终条件.pdf。在Java中，文件名：S－最终条件.pdf。

// Support for Japanese file name
fileNameX = new String(fileName.getBytes("Shift_JIS"),"ISO8859_1");

输出 fileNameX 为 S?最终条件.pdf。因此它抛出一个错误。我正在尝试以 PDF 格式输出文件，但无法识别特定的日语字符“－”，并且在输出时抛出错误。

请帮我解决这个问题。
谢谢，普拉萨纳

原文

I have a file name with Japanese characters. file name: S－最終条件.pdf. In Java, file name: S－最終条件.pdf.

// Support for Japanese file name
fileNameX = new String(fileName.getBytes("Shift_JIS"),"ISO8859_1");

The output fileNameX is coming out S?最終条件.pdf. Hence it is throwing an error. I am trying to outstream the file in PDF format, but the particular Japanese character "－" is not recognised and it is throwing error while streaming.

Please help me solve this issue.
Thanks, Prasanna

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

落在眉间の轻吻 2024-10-27 08:01:08

让我们看看您的代码实际上做了什么：

//Assign to bytes the UTF-16 String fileName Encoded in Shift_JIS
//bytes now contains the binary Shift_JIS representation of your String
final byte[] bytes = fileName.getBytes("Shift_JIS");

//Create a new String UTF-16 by interpreting bytes as ISO8859_1
//Takes the Shift_JIS encoded bytes and interprets it as ISO8859_1
new String(bytes,"ISO8859_1");

Java 字符串使用 UTF-16 作为其内部表示。创建字符串时无法指定目标编码，因为 UTF-16 是固定的，您必须为字节数组指定正确的源编码“Shift_JIS”。

fileNameX 应该在不进行转换的情况下正确显示。

Let's see what your code actually does:

//Assign to bytes the UTF-16 String fileName Encoded in Shift_JIS
//bytes now contains the binary Shift_JIS representation of your String
final byte[] bytes = fileName.getBytes("Shift_JIS");

//Create a new String UTF-16 by interpreting bytes as ISO8859_1
//Takes the Shift_JIS encoded bytes and interprets it as ISO8859_1
new String(bytes,"ISO8859_1");

Java strings use UTF-16 for their internal representation. You cannot specify a target encoding when you create a string as UTF-16 is fixed, you have to Specify the correct source encoding which is "Shift_JIS" for the bytes array.

The fileNameX should come out correct without converting.

回复收藏 0 原文

我纯我任性 2024-10-27 08:01:08

这就是Shift_JIS代码和Unicode的映射问题。
Shift_JIS 不具有 Unicode 的所有字符，因此某些字符会变成“？”。

以下是从 Unicode 转换为 Shift_JIS 的结果。

RESULT  UNICODE
[NG]    U+2012 (FIGURE DASH)
[NG]    U+2013 (EN DASH)
<OK>    U+2014 (EM DASH)
[NG]    U+2015 (HORIZONTAL BAR)
<OK>    U+2212 (MINUS SIGN)
[NG]    U+FF0D (FULLWIDTH HYPHEN-MINUS)

一种解决方案是替换代码。

U+2012,U+2013,U+2015 --> U+2014
U+FF0D               --> U+2212

This is the mapping problem both Shift_JIS code and Unicode.
Shift_JIS doesn't have all the characters of Unicode so some characters become "?".

Following is the result of conversion from Unicode to Shift_JIS.

RESULT  UNICODE
[NG]    U+2012 (FIGURE DASH)
[NG]    U+2013 (EN DASH)
<OK>    U+2014 (EM DASH)
[NG]    U+2015 (HORIZONTAL BAR)
<OK>    U+2212 (MINUS SIGN)
[NG]    U+FF0D (FULLWIDTH HYPHEN-MINUS)

One solution is a replacement of the code.

U+2012,U+2013,U+2015 --> U+2014
U+FF0D               --> U+2212

回复收藏 0 原文

夜夜流光相皎洁 2024-10-27 08:01:08

@josefx 和 @Yu Sun corn 的回答均为收集。

首先，正如 @josefx 回答的那样，当您想要字符串的 Shift JIS 表示形式并将其反转为 String 对象时，您必须将相同的编码传递给 String#getBytes(String charsetName) 和构造函数String(byte[] bytes, String charsetName)。

其次，您必须使用 Windows-31J 而不是 Shift_JIS 作为编码名称。 Windows-31J 和 Shift_JIS 的编码方案相同，但字符集略有不同：Windows-31J 有一些附加字符（注意 Windows 中的 Windows-31J文档称为“Shift JIS”，因此在大多数情况下，当您想使用 Shift JIS 时，应该使用 Windows-31J。正如@Yu Sun corn所回答的，字符串“S－最终条件.pdf”包含一个不包含在Shift JIS字符集中的字符：－。 Windows-31J的字符集包含该字符。

最后，您应该使用的代码如下：

// Get the byte-stream representation of Japanese characters in Windows-31J encoding.
// Windows-31J (aka MS932) is the default encoding when you run Java VM in Windows with Japanese locale.
byte [] textBytes = name.getBytes("Windows-31J");

// Reverse byte-stream representation to a String object
System.out.println(new String(textBytes, "Windows-31J"));

The Answers by @josefx and @Yu Sun corn are both collect.

First, as @josefx answered, when you want the Shift JIS representation of a string and reverse it to a String object, you have to pass the same encoding to String#getBytes(String charsetName) and the constructor String(byte[] bytes, String charsetName).

Second, you have to use Windows-31J instead of Shift_JIS as the encoding name. The encoding scheme of Windows-31J and Shift_JIS are the same, but the character set is slightly different: Windows-31J has some additional characters (Note that Windows-31J in Windows document is called "Shift JIS". So in most cases, you should use Windows-31J when you want to use Shift JIS). As @Yu Sun corn answered, the string "S－最終条件.pdf" contains a character that is not contained in the character set of Shift JIS: －. The character set of Windows-31J contains this character.

Finally, the code you should use will be like this:

// Get the byte-stream representation of Japanese characters in Windows-31J encoding.
// Windows-31J (aka MS932) is the default encoding when you run Java VM in Windows with Japanese locale.
byte [] textBytes = name.getBytes("Windows-31J");

// Reverse byte-stream representation to a String object
System.out.println(new String(textBytes, "Windows-31J"));

回复收藏 0 原文

~没有更多了~