Java Map,如何正确将UTF-8字符串放入地图?
我有一个 Map,更准确地说是 LinkedHashMap。 我想将一个字符串对象放入其中。 然后我读取这个值以查看实际存储的内容。 字符串本身包含非 ASCII 字符(西里尔字母、韩语等)。 一旦我把它放到地图上然后阅读,这些字符就会被替换为 ???s。 一些代码:
Map obj = new LinkedHashMap();
System.out.println("name: " + getName()); // prints "i4niac_сим_sim"
obj.put("name", getName());
System.out.println("written stuff: " + obj.get("name")); // prints i4niac_???_sim
这里有什么技巧? 我使用这个映射来使用 json-simple 制作 JSON 对象并将其从服务器发送到客户端。
更新:
呃,抱歉造成了所有混乱。 首先我责怪数据存储,然后责怪地图,最后,正如所料,这是我在另一个地方的错。 我在应用程序引擎中发送 json 数据,将内容类型设置为“application/json”,
public void doPost(HttpServletRequest req, HttpServletResponse resp) {
// ...
resp.setContentType("application/json");
resp.getWriter().println(jsonObj.toString());
}
但无论我在后端尝试哪种技巧,它都不会以 UTF-8 形式发送。 更改为后
resp.setCharacterEncoding("UTF-8");
我终于收到了非ascii字符的UTF-8转义码。
I have a Map, LinkedHashMap to be more exact.
And I want to put a string object to it.
And then I read this value to see what's actually stored.
The string itself has non-ascii characters (cyrillic, korean, etc).
Once I put it to the map and then read, these characters are replaced with ???s.
Some code:
Map obj = new LinkedHashMap();
System.out.println("name: " + getName()); // prints "i4niac_сим_sim"
obj.put("name", getName());
System.out.println("written stuff: " + obj.get("name")); // prints i4niac_???_sim
What's the trick here?
I am using this map to make JSON object with json-simple and send it from server to client.
Update:
Ugh, sorry for all the mess.
First I blamed datastore, then map, finally, as expected, it was my fault in another place.
I was sending json data in app engine setting content type as "application/json"
public void doPost(HttpServletRequest req, HttpServletResponse resp) {
// ...
resp.setContentType("application/json");
resp.getWriter().println(jsonObj.toString());
}
It just never sent it as UTF-8, no matter which tricks I tried on backend side.
After changing to
resp.setCharacterEncoding("UTF-8");
I finally received UTF-8 escape codes for non-ascii characters.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
你怎么知道字符被替换为??? -- 您的控制台不是设置为 ASCII 代码页或类似的情况吗?或者您在控制台中使用的字体是否无法正确处理 UTF-8 字符?您是否尝试过将所有这些内容写入一个文件并在 MS Word 之类的工具中打开它并验证是否是这种情况?
how do you know the characters are replaced with ??? -- is it not the case that your console is set to ASCII codepage or similar? Or is it not the case that the font you used in the console doesn't handle UTF-8 chars properly? Have you tried to write all of these to a file and open it in something like MS Word and verify if that's the case?
字符串不可变,并且始终编码为 UTF-16。如果要以任何其他编码表示字符数据,则必须使用字节数组。
自
LinkedHashMap
不会改变或序列化您的字符串,您放入其中的对象值应该是返回的值。我能想到的唯一解释:
getName()
不会每次(最有可能)返回对同一字符串的引用System.out
PrintWriter< /code> 同时修改
您可以发出字符串的十六进制形式以确保显示错误不是问题:
对于
i4niac_сим_sim
,此代码将返回:Strings immutable and are always encoded as UTF-16. If you want to represent character data in any other encoding, you must use a byte array.
Since
LinkedHashMap
doesn't mutate or serialize your string, the object value you put into it should be the one returned.The only explanations I can think of:
getName()
doesn't return a reference to the same String every time (most likely)System.out
PrintWriter
is modified concurrentlyYou can emit the hexadecimal form of the String to ensure display bugs aren't the problem:
For
i4niac_сим_sim
, this code will return:使用
-encoding
标志重新编译您的代码,如下所示
Recompile your code with
-encoding
flaglike this