HTML 字符集和编码
我需要构建一个将本地化为非拉丁语言的 JSP 应用程序。 JSP 页面包含一些标签,这些标签从数据库 (MySQL) 检索一些显示元素,并从资源包(内容以 Unicode 编写的属性文件,我也尝试过 UTF-8)检索其他显示元素。
我认为,问题在于从资源包返回的字符串似乎将 unicode/UTF-8 代码点的每个字节放在其自己的字符串字符中。例如/u0620在返回的字符串中占用两个字符,第一个字符为0x06,第二个字符为0x20。从资源包中检索的字符串的大小是双倍的。
我的问题是在属性文件本身还是在resourceBundle 中?
非常感谢任何帮助。
I need to build an JSP application that will be localized to non-latin languages. The JSP page contains tags that retrieve some display elements from a database (MySQL) and others from a resource bundle (properties file whose contents are written in Unicode and I also tried UTF-8).
The prolem, I believe, is that the string returned from the resource bundle seems to place each byte of the unicode/UTF-8 code-point in its own string character. For example /u0620 occupies two characters in the returned string, the first character has 0x06 and the second character has 0x20. Strings retrieved from the resource bundle are double in size.
Is my problem in the properties file itself or is it in the resourceBundle ?
Any help is very appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您使用UTF-8,那么一个UTF-8字符实际上IS 2个字节。编程语言是否将其处理为两个字节或一个字符对于您的实际代码来说并不重要。
另外,请避免使用 UTF-16 或其他形式的 Unicode。 UTF-8 是当今唯一“正确”的做事方式。
另外,正如 bmargulies 指出的那样,您可能需要使用
pageEncoding="utf-8"
If you use UTF-8, then a UTF-8 character actually IS 2 bytes. Whether the programming language handles it as two bytes or one character shouldn't be important for your actual code.
Also, avoid using UTF-16 or other forms of Unicode. UTF-8 is the only "proper" way to do things nowadays.
Also, as bmargulies pointed out, you may want to use
pageEncoding="utf-8"