Java Map,如何正确将UTF-8字符串放入地图?

发布于 2024-11-07 09:39:06 字数 934 浏览 0 评论 0原文

我有一个 Map,更准确地说是 LinkedHashMap。 我想将一个字符串对象放入其中。 然后我读取这个值以查看实际存储的内容。 字符串本身包含非 ASCII 字符(西里尔字母、韩语等)。 一旦我把它放到地图上然后阅读,这些字符就会被替换为 ???s。 一些代码:

Map obj = new LinkedHashMap();
System.out.println("name: " + getName());  // prints "i4niac_сим_sim"
obj.put("name", getName());
System.out.println("written stuff: " + obj.get("name"));  // prints i4niac_???_sim

这里有什么技巧? 我使用这个映射来使用 json-simple 制作 JSON 对象并将其从服务器发送到客户端。

更新:

呃,抱歉造成了所有混乱。 首先我责怪数据存储,然后责怪地图,最后,正如所料,这是我在另一个地方的错。 我在应用程序引擎中发送 json 数据,将内容类型设置为“application/json”,

public void doPost(HttpServletRequest req, HttpServletResponse resp) {
// ...
        resp.setContentType("application/json");
        resp.getWriter().println(jsonObj.toString());
}

但无论我在后端尝试哪种技巧,它都不会以 UTF-8 形式发送。 更改为后

    resp.setCharacterEncoding("UTF-8");

我终于收到了非ascii字符的UTF-8转义码。

I have a Map, LinkedHashMap to be more exact.
And I want to put a string object to it.
And then I read this value to see what's actually stored.
The string itself has non-ascii characters (cyrillic, korean, etc).
Once I put it to the map and then read, these characters are replaced with ???s.
Some code:

Map obj = new LinkedHashMap();
System.out.println("name: " + getName());  // prints "i4niac_сим_sim"
obj.put("name", getName());
System.out.println("written stuff: " + obj.get("name"));  // prints i4niac_???_sim

What's the trick here?
I am using this map to make JSON object with json-simple and send it from server to client.

Update:

Ugh, sorry for all the mess.
First I blamed datastore, then map, finally, as expected, it was my fault in another place.
I was sending json data in app engine setting content type as "application/json"

public void doPost(HttpServletRequest req, HttpServletResponse resp) {
// ...
        resp.setContentType("application/json");
        resp.getWriter().println(jsonObj.toString());
}

It just never sent it as UTF-8, no matter which tricks I tried on backend side.
After changing to

    resp.setCharacterEncoding("UTF-8");

I finally received UTF-8 escape codes for non-ascii characters.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

却一份温柔 2024-11-14 09:39:06

你怎么知道字符被替换为??? -- 您的控制台不是设置为 ASCII 代码页或类似的情况吗?或者您在控制台中使用的字体是否无法正确处理 UTF-8 字符?您是否尝试过将所有这些内容写入一个文件并在 MS Word 之类的工具中打开它并验证是否是这种情况?

how do you know the characters are replaced with ??? -- is it not the case that your console is set to ASCII codepage or similar? Or is it not the case that the font you used in the console doesn't handle UTF-8 chars properly? Have you tried to write all of these to a file and open it in something like MS Word and verify if that's the case?

怀念你的温柔 2024-11-14 09:39:06

Java地图,如何正确将UTF-8字符串放入地图?

字符串不可变,并且始终编码为 UTF-16。如果要以任何其他编码表示字符数据,则必须使用字节数组。

LinkedHashMap不会改变或序列化您的字符串,您放入其中的对象值应该是返回的值。

我能想到的唯一解释:

  • getName() 不会每次(最有可能)返回对同一字符串的引用
  • System.out PrintWriter< /code> 同时修改
  • 接收数据的控制台的编码同时修改

您可以发出字符串的十六进制形式以确保显示错误不是问题:

public static String toCodeUnits(String s) {
  StringBuilder sb = new StringBuilder();
  for(char codeUnit : s.toCharArray()) {
    sb.append(String.format("%04x ", (int) codeUnit));
  }
  return sb.toString();
}

对于 i4niac_сим_sim,此代码将返回:

"0069 0034 006e 0069 0061 0063 005f 0441 0438 043c 005f 0073 0069 006d "

Java Map, how to put UTF-8 string to the map correctly?

Strings immutable and are always encoded as UTF-16. If you want to represent character data in any other encoding, you must use a byte array.

Since LinkedHashMap doesn't mutate or serialize your string, the object value you put into it should be the one returned.

The only explanations I can think of:

  • getName() doesn't return a reference to the same String every time (most likely)
  • the System.out PrintWriter is modified concurrently
  • the encoding of the console receiving the data is modified concurrently

You can emit the hexadecimal form of the String to ensure display bugs aren't the problem:

public static String toCodeUnits(String s) {
  StringBuilder sb = new StringBuilder();
  for(char codeUnit : s.toCharArray()) {
    sb.append(String.format("%04x ", (int) codeUnit));
  }
  return sb.toString();
}

For i4niac_сим_sim, this code will return:

"0069 0034 006e 0069 0061 0063 005f 0441 0438 043c 005f 0073 0069 006d "
童话里做英雄 2024-11-14 09:39:06

使用 -encoding 标志重新编译您的代码,

如下所示

javac -encoding UTF-8 Test3.java

Recompile your code with -encoding flag

like this

javac -encoding UTF-8 Test3.java
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文