避免在 Java 中打印 unicode 替换字符
在 Java 中,为什么 Character.toString((char) 65533)
打印出这个符号: � ?
我有一个 Java 程序,可以在各处打印这些字符。这是一个大计划。我能做些什么来避免这种情况有什么想法吗?
In Java, why does Character.toString((char) 65533)
print out this symbol: � ?
I have a Java program which prints these characters all over the place. Its a big program. Any ideas on what I can do to avoid this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
最可能的情况之一是您尝试使用 UTF-8 字符集读取 ISO-8859 数据。如果您遇到不是有效 UTF-8 的字符序列,那么它将被替换为 � 符号。
检查您的输入流,并确保使用正确的字符集读取它们。
One of the most likely scenarios is that you are trying to read ISO-8859 data using the UTF-8 character set. If you come across a sequence of characters that is not valid UTF-8, then it will be replaced with the � symbol.
Check your input streams, and ensure that you read them using the correct character set.
因为正是这个特定字符IS与特定相关联代码点。它没有像您想象的那样显示随机字符。
你的问题出在别的地方。至少可以归结为您应该设置涉及
byte
-char
转换的每一步(将文本存储在文件/数据库中,从文件中读取文本) /db、操作文本、传输文本、显示文本等)以使用UTF-8
。引起我注意的是,Java 对
0xFFFD
绝对没有做任何特殊的事情,它只是用问号?
替换未覆盖的字符,并且当您一直坚持0xFFFD来自Java。我知道 Firefox 完全按照您所说的操作,那么您是否可能将“Firefox”与“Java”混淆了?
如果这是真的,并且您实际上正在谈论 Java Web 应用程序,那么您至少需要将 HTTP 响应编码设置为
UTF-8
。您可以通过将<%@ page pageEncoding="UTF-8" %>
放在相关 JSP 页面的顶部来实现此目的。您可能会发现这篇文章对获取更多背景信息以及解决此“Unicode 问题”所需的所有步骤和解决方案的详细概述。Because exact this particular character IS associated with the particular codepoint. It does not display a random character as you seem to think.
Your problem lies somewhere else. It at least boils down that you should set every step which involves
byte
-char
conversions (storing text in file/db, reading text from file/db, manipulating text, transferring text, displaying text, etcetera) to useUTF-8
.Which catches my eye is the fact that Java does absolutely nothing special with
0xFFFD
, it just replaces uncovered chars by a question mark?
and that while you keep insisting that0xFFFD
comes from Java. I know that Firefox does exactly what you said, so are you maybe confusing "Firefox" with "Java"?If this is true and you're actually talking about a Java webapplication, then you need to set at least the HTTP response encoding to
UTF-8
. You can do that by putting<%@ page pageEncoding="UTF-8" %>
in top of the JSP page in question. You may find this article useful to get more background information and a detailed overview of all steps and solutions you need to apply to solve this "Unicode problem".没有 Unicode 字符 U+FFFD。因此,该代码在逻辑上是错误的。 Unicode 替换符号的预期用途是替换错误的输入(例如
(char)65533
)。如何解决:不要在字符串中放入垃圾。字符串用于文本。字节用于随机二进制数据。
There is no Unicode character U+FFFD. Hence, the code is logically incorrect. The intended use of the Unicode Replacement Symbol is to be substitued for bad input (such as
(char)65533
).How to fix it: don't put junk in strings. Strings are for text. Bytes are for random binary data.
那么,您想要它做什么?如果您“到处”都收到这些字符,我怀疑您的数据有问题...您收到无法用 Unicode 表示的数据的情况应该很少见。
您如何开始获取数据?
Well, what do you want it to do? If you're getting these characters "all over the place" I suspect you have bad data... it should be pretty rare that you receive data which can't be represented in Unicode.
How are you getting the data to start with?
查看字符编码入门。
Have a look at this primer on character encodings.