如何将 Javascript 字符串转码为 ISO-8859-1?

发布于 2024-08-21 12:37:55 字数 1203 浏览 13 评论 0原文

我正在编写一个 Chrome 扩展程序,可与使用 ISO-8859-1 的网站配合使用。只是为了提供一些背景信息,我的扩展所做的是通过添加更方便的帖子表单来更快地在网站论坛中发帖。然后通过 Ajax 调用(使用 jQuery)发送写入消息的文本区域的值。

如果消息包含类似 á 的字符,这些字符在发布的消息中显示为 á。强制浏览器显示 UTF-8 而不是 ISO-8859-1 可使 á 显示正确。

据我了解,Javascript 使用 UTF-8 作为其字符串,因此我的理论是,如果我在发送字符串之前将字符串转码为 ISO-8859-1,它应该可以解决我的问题。然而,似乎没有直接的方法可以在 Javascript 中进行这种转码,而且我无法触及服务器端代码。有什么建议吗?

我尝试将创建的表单设置为使用 iso-8859-1,如下所示:

var form = document.createElement("form");
form.enctype = "application/x-www-form-urlencoded; charset=ISO-8859-1";

而且:

var form = document.createElement("form");
form.encoding = "ISO-8859-1";

但这似乎不起作用。

编辑:

问题实际上在于 jQuery 如何对消息(或沿途的某些内容)进行 urlencode,我通过告诉 jQuery 不要处理数据并自己执行此操作来修复此问题,如以下代码片段所示:

function cfaqs_post_message(msg) {
  var url = cfaqs_build_post_url();
  msg = escape(msg).replace(/\+/g, "%2B");
  $.ajax({
    type: "POST",
    url: url,
    processData: false,
    data: "message=" + msg + "&post=Preview Message",
    success: function(html) {
      // ...
    },
    dataType: "html",
    contentType: "application/x-www-form-urlencoded"
  });
}

I'm writing a Chrome extension that works with a website that uses ISO-8859-1. Just to give some context, what my extension does is making posting in the site's forums quicker by adding a more convenient post form. The value of the textarea where the message is written is then sent through an Ajax call (using jQuery).

If the message contains characters like á these characters appear as á in the posted message. Forcing the browser to display UTF-8 instead of ISO-8859-1 makes the á appear correctly.

It is my understanding that Javascript uses UTF-8 for its strings, so it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem. However there seems to be no direct way to do this transcoding in Javascript, and I can't touch the server side code. Any advice?

I've tried setting the created form to use iso-8859-1 like this:

var form = document.createElement("form");
form.enctype = "application/x-www-form-urlencoded; charset=ISO-8859-1";

And also:

var form = document.createElement("form");
form.encoding = "ISO-8859-1";

But that doesn't seem to work.

EDIT:

The problem actually lied in how jQuery was urlencoding the message (or something along the way), I fixed this by telling jQuery not to process the data and doing it myself as is shown in the following snippet:

function cfaqs_post_message(msg) {
  var url = cfaqs_build_post_url();
  msg = escape(msg).replace(/\+/g, "%2B");
  $.ajax({
    type: "POST",
    url: url,
    processData: false,
    data: "message=" + msg + "&post=Preview Message",
    success: function(html) {
      // ...
    },
    dataType: "html",
    contentType: "application/x-www-form-urlencoded"
  });
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

放我走吧 2024-08-28 12:37:55

据我了解,Javascript 使用 UTF-8 作为字符串

不,不。

每个页面都有其在元标记中定义的字符集编码,就在 head 元素下方

<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>

,或者

<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>

除此之外,每个页面应使用目标字符集编码进行编辑。否则,它将无法按预期工作。

在服务器端定义其目标字符集编码是一个好主意。

Java
<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>

PHP
header("Content-Type: text/html; charset=UTF-8");

C#
I do not know how to...

设置每个脚本文件是否使用敏感字符(á、é、í、ó、ú 等...)可能是个好主意。

<script type="text/javascript" charset="UTF-8" src="/PATH/TO/FILE.js"></script>

...

所以我的理论是,如果我在发送之前将字符串转码为 ISO-8859-1,它应该可以解决我的问题

不,不。

目标服务器可以处理 ISO-8859-1 以外的字符串。例如,无论您如何设置页面,Tomcat 都会按照 ISO-8859-1 进行处理。因此,在服务器端,您可能必须根据您设置页面的方式来设置请求。

Java
request.setCharacterEncoding("UTF-8")

PHP
// I do not know how to...

如果您确实想转换目标字符集编码,请尝试如下

InternetExplorer
    formElement.encoding = "application/x-www-form-urlencoded; charset=ISO-8859-1";
ELSE
    formElement.enctype  = "application/x-www-form-urlencoded; charset=ISO-8859-1";

或者,您应该提供一个函数来获取每个字符使用的 Unicode 字符集中的数字表示形式。无论目标字符集编码如何,它都会起作用。例如,á 作为 Unicode 字符集是 \u00E1;

alert("á without its Unicode Character Set numerical representation");
function convertToUnicodeCharacterSet(value) {
    if(value == "á")
        return "\u00E1";
}
alert("á Numerical representation in Unicode Character Set is: " + convertToUnicodeCharacterSet("á"));

在这里您可以看到实际效果:

您可以使用此链接 作为指南(请参阅 JavaScript 转义)

添加到原始答案中我如何实现 jQuery 功能

var dataArray = $(formElement).serializeArray();
var queryString = "";
for(var i = 0; i < dataArray.length; i++) {
    queryString += "&" + dataArray[i]["name"] + "+" + encodeURIComponent(dataArray[i]["value"]);
}
$.ajax({
    url:"url.htm",
    data:dataString,
    contentType:"application/x-www-form-urlencoded; charset=UTF-8",
    success:function(response) {
        // proccess response
    });
});

它工作正常,没有任何头痛。

问候,

It is my understanding that Javascript uses UTF-8 for its strings

No, no.

Each page has its charset enconding defined in meta tag, just below head element

<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>

or

<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>

Besides that, each page should be edited with the target charset encoding. Otherwise, it will not work as expected.

And it is a good idea to define its target charset encoding on server side.

Java
<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>

PHP
header("Content-Type: text/html; charset=UTF-8");

C#
I do not know how to...

And it could be a good idea to set up each script file whether it uses sensitive characters (á, é, í, ó, ú and so on...).

<script type="text/javascript" charset="UTF-8" src="/PATH/TO/FILE.js"></script>

...

So it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem

No, no.

The target server could handle strings in other than ISO-8859-1. For instance, Tomcat handles in ISO-8859-1, no matter how you set up your page. So, on server side, you could have to set up your request according how your set up your page.

Java
request.setCharacterEncoding("UTF-8")

PHP
// I do not know how to...

If you really want to translate the target charset encoding, TRY as follows

InternetExplorer
    formElement.encoding = "application/x-www-form-urlencoded; charset=ISO-8859-1";
ELSE
    formElement.enctype  = "application/x-www-form-urlencoded; charset=ISO-8859-1";

Or you should provide a function that gets the numeric representation, in Unicode Character Set, used by each character. It will work regardless of the target charset encoding. For instance, á as Unicode Character Set is \u00E1;

alert("á without its Unicode Character Set numerical representation");
function convertToUnicodeCharacterSet(value) {
    if(value == "á")
        return "\u00E1";
}
alert("á Numerical representation in Unicode Character Set is: " + convertToUnicodeCharacterSet("á"));

Here you can see in action:

You can use this link as guideline (See JavaScript escapes)

Added to original answer how I implement jQuery funcionality

var dataArray = $(formElement).serializeArray();
var queryString = "";
for(var i = 0; i < dataArray.length; i++) {
    queryString += "&" + dataArray[i]["name"] + "+" + encodeURIComponent(dataArray[i]["value"]);
}
$.ajax({
    url:"url.htm",
    data:dataString,
    contentType:"application/x-www-form-urlencoded; charset=UTF-8",
    success:function(response) {
        // proccess response
    });
});

It works fine without any headache.

Regards,

清风无影 2024-08-28 12:37:55

我有一个非常相似的问题。我需要使用 JQuery 传递 URL 参数来进行 ajax 调用,并且大多数时候参数值包含重音符号。

两个页面都必须设置为 charset=ISO-8859-1 并且 javascript 的函数:encodeURI、encodeURIComponent 等仅使用 UTF-8。

我所做的是在原始页面中创建一个链接,包括没有任何编码的所有参数,比方说:

var myLink = document.getElementById("myHiddenLink");
myLink.setAttribute("href", "México, Perú, María and any other words with accents and spaces");

然后将 href 值分配给一个变量,如下所示:

var theLink = myLink.getAttribute("href");

所以最后“theLink”变量值是 ISO-8859-1 编码的,一切都很好。

I had a very similar problem. I needed to pass a URL parameter using JQuery to make an ajax call, and most of the times parameters values included accents.

Both pages had to be set to charset=ISO-8859-1 and javascript's functions: encodeURI, encodeURIComponent etc. only uses UTF-8.

What I did was to create a link in the original page, including all parameters without any encoding, let's say:

var myLink = document.getElementById("myHiddenLink");
myLink.setAttribute("href", "México, Perú, María and any other words with accents and spaces");

and then assign the href value to a variable, like this:

var theLink = myLink.getAttribute("href");

So finally "theLink" variable value was ISO-8859-1 encoded, and everything worked just fine.

枫以 2024-08-28 12:37:55

您现在可以使用 TextDecoder 解码字符串:

const decoded = new TextDecoder('windows-1252').decode(encoded)

请注意,windows-1252 相当于 ISO-8859-1 了解更多信息,请查看 https://developer.mozilla.org/en-US/docs/Web/ API/Encoding_API/编码

You can now decode strings using TextDecoder:

const decoded = new TextDecoder('windows-1252').decode(encoded)

note that windows-1252 is equivalent to ISO-8859-1 for more, checkout https://developer.mozilla.org/en-US/docs/Web/API/Encoding_API/Encodings

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文