编码/解码奇怪的问题
这行代码对编码的中文单词进行解码:
URLDecoder.decode("%E4%BB%BB%E4%BD%95%E8%BD%A6%E8%BE%86%E5%BA%94",
"UTF-8").getBytes().length
当我在 JSP 页面(在 Jboss 上)中运行它时,它会打印 5:
<%= URLDecoder.decode("%E4%BB%BB%E4%BD%95%E8%BD%A6%E8%BE%86%E5%BA%94",
"UTF-8").getBytes().length %>
在桌面应用程序中运行它会打印 15:
public static void main(String[] args) {
System.out.println(URLDecoder.decode(
"%E4%BB%BB%E4%BD%95%E8%BD%A6%E8%BE%86E5%BA%94", "UTF-8"
).getBytes().length);
}
Why?我想让jsp也得到15,怎么样?
This line of code, which decodes an encoded Chinese word:
URLDecoder.decode("%E4%BB%BB%E4%BD%95%E8%BD%A6%E8%BE%86%E5%BA%94",
"UTF-8").getBytes().length
When I run it in a JSP page (on Jboss) it prints 5:
<%= URLDecoder.decode("%E4%BB%BB%E4%BD%95%E8%BD%A6%E8%BE%86%E5%BA%94",
"UTF-8").getBytes().length %>
Running it in a desktop application prints 15:
public static void main(String[] args) {
System.out.println(URLDecoder.decode(
"%E4%BB%BB%E4%BD%95%E8%BD%A6%E8%BE%86E5%BA%94", "UTF-8"
).getBytes().length);
}
Why? And I want the jsp to get 15 also, how?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
JBoss 似乎使用了不同的默认编码,它不能代表字符串中的所有字符。您可能应该使用
getBytes("UTF-8")
。It seems like JBoss is using a different default encoding, which can not represent all characters in your string. You should probably use
getBytes("UTF-8")
.我不知道为什么会有差异(这取决于您正在运行的特定 Java 环境),但我可以告诉您差异是什么:
您的字符串中有 15 个字节。这些字节代表 5 个 Unicode 字符,每个字符 3 个字节。
您可以看出,因为 3 字节 UTF-8 字符的第一个字节始终以十六进制“E”开头。
I don't know why there is a difference (that depends on the particular Java environments you're running), but I can tell you what that difference is:
There are 15 bytes in your string. These bytes represent 5 Unicode characters, of 3 bytes each.
You can tell because the first byte of a 3-byte UTF-8 character always starts with hexidecimal "E".