网页内的Java字符串编码转换

发布于 2024-08-20 19:37:18 字数 370 浏览 5 评论 0 原文

我有一个网页(通过其标头)编码为 WIN-1255。 Java 程序创建自动嵌入页面中的文本字符串。问题在于原始字符串是用 UTF-8 编码的,因此在页面中创建了一个乱码文本字段。

不幸的是,我无法更改页面编码 - 这是客户专有系统所要求的。

有什么想法吗?

更新:

我正在创建的页面是一个 RSS 提要,需要设置为 WIN-1255,显示从另一个以 UTF-8 编码的提要获取的信息。

第二次更新:

感谢您的所有回复。我已经成功地转换了字符串,但是还是乱码。问题是除了标头编码之外还应该设置 XML 编码。

亚当

I have a webpage that is encoded (through its header) as WIN-1255.
A Java program creates text string that are automatically embedded in the page. The problem is that the original strings are encoded in UTF-8, thus creating a Gibberish text field in the page.

Unfortunately, I can not change the page encoding - it's required by a customer propriety system.

Any ideas?

UPDATE:

The page I'm creating is an RSS feed that needs to be set to WIN-1255, showing information taken from another feed that is encoded in UTF-8.

SECOND UPDATE:

Thanks for all the responses. I've managed to convert th string, and yet, Gibberish. Problem was that XML encoding should be set in addition to the header encoding.

Adam

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

超可爱的懒熊 2024-08-27 19:37:18

到目前为止,您需要设置响应编写器的编码。仅使用响应标头,您基本上只是指示客户端应用程序使用哪种编码来解释/显示页面。如果响应本身是用不同的编码编写的,那么这将不起作用。

您遇到此问题的上下文完全不清楚(请在以后的类似问题中详细说明),因此这里有几种解决方案:

如果是 JSP,则需要在 JSP 顶部设置以下内容来设置响应编码:

<%@ page pageEncoding="WIN-1255" %>

如果是 Servlet,则需要在第一次刷新之前设置以下内容以设置响应编码:

response.setCharacterEncoding("WIN-1255");

顺便说一下,两者都使用 charset 自动隐式设置 Content-Type 响应标头code> 参数指示客户端使用相同的编码来解释/显示页面。另请参阅这篇文章了解更多信息。

如果它是一个依赖于基本 java.net 和/或 java.io API 的自行开发的应用程序,那么您需要通过 OutputStreamWriter< /code> 使用 带有 2 个参数的构造函数,您可以在其中指定编码:

Writer writer = new OutputStreamWriter(someOutputStream, "WIN-1255");

To the point, you need to set the encoding of the response writer. With only a response header you're basically only instructing the client application which encoding to use to interpret/display the page. This ain't going to work if the response itself is written with a different encoding.

The context where you have this problem is entirely unclear (please elaborate about it as well in future problems like this), so here are several solutions:

If it is JSP, you need to set the following in top of JSP to set the response encoding:

<%@ page pageEncoding="WIN-1255" %>

If it is Servlet, you need to set the following before any first flush to set the response encoding:

response.setCharacterEncoding("WIN-1255");

Both by the way automagically implicitly set the Content-Type response header with a charset parameter to instruct the client to use the same encoding to interpret/display the page. Also see this article for more information.

If it is a homegrown application which relies on the basic java.net and/or java.io API's, then you need to write the characters through an OutputStreamWriter which is constructed using the constructor taking 2 arguments wherein you can specify the encoding:

Writer writer = new OutputStreamWriter(someOutputStream, "WIN-1255");
美男兮 2024-08-27 19:37:18

假设您可以控制原始(正确表示的)字符串,并且只需在 win-1255 中输出它们:

import java.nio.charset.*;
import java.nio.*;
Charset win1255 = Charset.forName("windows-1255");
ByteBuffer bb = win1255.encode(someString);
byte[] ba = new byte[bb.limit()];

然后,只需将 ba 的内容写入适当的位置即可。

编辑:你用 ba 做什么取决于你的环境。例如,如果您使用 servlet,您可能会这样做:

ServletOutputStream os = ...
os.write(ba);

我们也不应该忽视调用 setContentType("text/html; charset=windows-1255") (setContentType) ,然后使用 getWriter

您已澄清您有一个需要解码的 UTF-8 文件。如果您尚未正确解码 UTF-8 字符串,这应该没什么大不了的。只要看看 InputStreamReader(someInputStream, Charset.forName("utf-8"))

Assuming you have control of the original (properly represented) strings, and simply need to output them in win-1255:

import java.nio.charset.*;
import java.nio.*;
Charset win1255 = Charset.forName("windows-1255");
ByteBuffer bb = win1255.encode(someString);
byte[] ba = new byte[bb.limit()];

Then, simply write the contents of ba at the appropriate place.

EDIT: What you do with ba depends on your environment. For instance, if you're using servlets, you might do:

ServletOutputStream os = ...
os.write(ba);

We also should not overlook the possible approach of calling setContentType("text/html; charset=windows-1255") (setContentType), then using getWriter normally. You did not make completely clear if windows-1255 was being set in a meta tag or in the HTTP response header.

You clarified that you have a UTF-8 file that you need to decode. If you're not already decoding the UTF-8 strings properly, this should no big deal. Just look at InputStreamReader(someInputStream, Charset.forName("utf-8"))

鲜血染红嫁衣 2024-08-27 19:37:18

什么是在页面中嵌入数据?要么它应该将其读取为文本(UTF-8),然后以网页编码(Win-1255)再次将其写出,要么您应该更改Java程序以在Win-1255中创建文件(或其他)来启动和。

如果您可以提供有关系统如何工作的更多详细信息(什么生成网页?它如何与 Java 程序交互?),那么事情就会变得更加清晰。

What's embedding the data in the page? Either it should read it as text (in UTF-8) and then write it out again in the web page's encoding (Win-1255) or you should change the Java program to create the files (or whatever) in Win-1255 to start with.

If you can give more details about how the system works (what's generating the web page? How does it interact with the Java program?) then it will make things a lot clearer.

冬天的雪花 2024-08-27 19:37:18

我正在创建的页面是一个 RSS 提要,需要设置为 WIN-1255,显示从另一个以 UTF-8 编码的提要获取的信息。

在这种情况下,使用解析器加载 UTF-8 XML。这应该正确地将数据解码为 UTF-16 字符数据(Java 字符串始终为 UTF-16)。您的输出机制应从 UTF-16 编码为 Windows-1255。

The page I'm creating is an RSS feed that needs to be set to WIN-1255, showing information taken from another feed that is encoded in UTF-8.

In this case, use a parser to load the UTF-8 XML. This should correctly decode the data to UTF-16 character data (Java Strings are always UTF-16). Your output mechanism should encode from UTF-16 to Windows-1255.

混浊又暗下来 2024-08-27 19:37:18
byte[] originalUtf8;//Here input

//utf-8 to java String:
String internal = new String(originalUtf8,Charset.forName("utf-8");
//java string to w1255 String
byte[] win1255 = internal.getBytes(Charset.forName("cp1255"));

//Here output
byte[] originalUtf8;//Here input

//utf-8 to java String:
String internal = new String(originalUtf8,Charset.forName("utf-8");
//java string to w1255 String
byte[] win1255 = internal.getBytes(Charset.forName("cp1255"));

//Here output
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文