如何从javascript发送具有相同编码的参数?

发布于 2024-08-28 08:55:14 字数 1422 浏览 4 评论 0原文

我有一个 javascript 文件,很多人都将其嵌入到他们的页面中。由于我托管该文件,因此我可以控制该 javascript 文件;我无法控制它的嵌入方式,因为很多人已经在使用它。

该 javascript 文件向我的 servlet 发送 GET 请求,并将随请求传递的参数记录到 DB 中。例如,javascript 向 http://myserver.com/servlet?p1=123&p2=aString 发送请求,然后 servlet 记录 123aString< /code> 以某种方式到数据库。

在发送字符串之前,我使用 encodeURIComponent() 对其进行编码。但我发现每个客户端都会发送具有不同编码的相同字符串,具体取决于他们的浏览器或他们正在访问的网站。因此,相同的字符串在到达 servlet 时会用不同的字符表示(因此它们是不同的字符串)。

我想做的是将字符串从 javascript 转换为一种编码,这样当它们到达客户端时,相同的单词会用相同的字符表示。

这怎么可能?

附言。如果有一种方法可以从 Java 转换编码,那么它也是适用的。

编辑:更准确地说,我从页面中选择一些单词并将其发送到服务器。这就是编码引起问题的地方。

编辑 2: 我不会通过 XMLHttpRequest 发送(也无法发送)GET 请求,因为域不同。我正在使用将 script 标签添加到 @streetpc 提到的 head 方法。

编辑 3: 目前,我正在通过在 javascript 端替换非 ASCII 字符来清理字符串,但我有一种感觉,这不是正确的方法:

function sanitize(word) {
    /*
    ğ : \u011f
    ü : \u00fc
    ş : \u015f
    ö : \u00f6
    ç : \u00e7
    ı : \u0131
    û : \u00fb
    */
    return encodeURIComponent(
            word.replace(/\u011f/g, '_g')
                .replace(/\u00fc/g, '_u')
                .replace(/\u00fb/g, '_u')
                .replace(/\u015f/g, '_s')
                .replace(/\u00f6/g, '_o')
                .replace(/\u00e7/g, '_c')
                .replace(/\u0131/g, '_i'));
}

I have a javascript file that lots of people have embedded to their pages. Since I am hosting the file, I have control over that javascript file; I cannot control the way it is embedded because lots of people is using it already.

This javascript file sends GET requests to my servlets, and the parameters passed with the request are recorded to DB. For example, javascript sends a request to http://myserver.com/servlet?p1=123&p2=aString and then servlet records 123 and aString to DB somehow.

Before sending strings I use encodeURIComponent() to encode it. But what I figured out is every client sends the same string with different encodings depending on either their browser or the site they are visiting. As a result, same strings are represented with different characters when it reaches servlet (so they are different strings).

What I am trying to do is to convert the strings to one kind of encoding from javascript so when they reach the client same words are represented with same characters.

How is this possible?

PS. If there is a way to convert the encoding from Java it is also applicable.

Edit: To be more precise, I select some words from the page and send it to the server. That is where encoding causes problems.

Edit 2: I am NOT sending (and can't send) GET requests via XMLHttpRequest, because domains are different. I am using adding script tag to head method that @streetpc mentioned.

Edit 3: At the moment I am sanitizing the strings by replacing non-ASCII characters at javascript side, but I have a feeling that this is not the way to go:

function sanitize(word) {
    /*
    ğ : \u011f
    ü : \u00fc
    ş : \u015f
    ö : \u00f6
    ç : \u00e7
    ı : \u0131
    û : \u00fb
    */
    return encodeURIComponent(
            word.replace(/\u011f/g, '_g')
                .replace(/\u00fc/g, '_u')
                .replace(/\u00fb/g, '_u')
                .replace(/\u015f/g, '_s')
                .replace(/\u00f6/g, '_o')
                .replace(/\u00e7/g, '_c')
                .replace(/\u0131/g, '_i'));
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

自演自醉 2024-09-04 08:55:14

我发现每个客户端都会发送具有不同编码的相同字符串

虽然这对于

提交来说是正常的,但对于 XMLHttpRequest 工作来说不应该发生。 encodeURIComponent 函数显式地始终写入 URL 编码的 UTF-8 字节,无论使用它的页面的编码如何。当然,说服您的 servlet 容器允许您读取这些 UTF-8 字节而不弄乱它们是另一回事,但这不应该取决于客户端。

如果您在脚本文件本身中使用原始非 ASCII 字符,则可能会出现问题。在这种情况下,这些字符的解释将根据浏览器用于加载脚本的字符集而有所不同。这可能受到以下因素的影响:

  1. Content-Type: text/javascript;charset= 标头中声明的任何字符集。
  2. 包含脚本的页面的字符集。

并非所有浏览器都支持 (1) 和 (2)。通常您可以依赖 (3),但作为第三方脚本作者,这是您无法控制的。因此,您应该在脚本中仅使用 ASCII 字符。 (使用 \u1234 转义在脚本的字符串文字中包含非 ASCII 字符,以绕过此限制。)

what I figured out is every client sends the same string with different encodings

Whilst that would be normal for <form> submissions, it should not happen for XMLHttpRequest work. The encodeURIComponent function explicitly always writes URL-encoded UTF-8 bytes, regardless of the encoding of the page from which it was used. Of course persuading your servlet container to allow you to read those UTF-8 bytes without messing them up is another story, but that shouldn't depend on the client.

What might be a problem is if you are using raw non-ASCII characters inside your script file itself. In that case the interpretation of those characters will vary according to the charset the browser is using to load the script. This may be affected by:

  1. any charset declared in the Content-Type: text/javascript;charset= header.
  2. any charset attribute declared on the <script src="..." charset="..."> element.
  3. the charset of the page that included the script.

(1) and (2) are not supported in all browsers. Normally you can rely on (3), but as a third-party script author that is out of your control. Therefore you should use only ASCII characters in your script. (Use \u1234 escapes to include non-ASCII characters in string literals in your script to get around this limitation.)

一杯敬自由 2024-09-04 08:55:14

您是否在 HTTP 标头中指定 JavaScript 文件的编码?就像 Content-type: text/javascript; charset=utf-8 当然,.js 文件以 UTF-8 保存。使用 Apache,您可以配置

AddCharset utf-8 .js 

或者让托管的 javascript 文件使用 charset='utf-8' 参数创建另一个 script 标记,并将其添加到 head 元素(就像大多数书签一样)。

我认为被解释为 UTF-8 代码的 JavaScript 应该获取/操作 UTF-8 字符串。

然后,在 Java Servlet 中,您可以指定要使用的输入编码:

request.setCharacterEncoding("UTF-8");

编辑:查看此页面有关 JavaScript 中的字符编码,特别是“设置字符编码”部分。

Do you specify the encoding of the JavaScript file in the HTTP headers? Like Content-type: text/javascript; charset=utf-8 with the .js file beign saved in UTF-8 of course. With Apache, you can configure

AddCharset utf-8 .js 

Or you can make the hosted javascript file create another script tag with a charset='utf-8' parameter and add-it to the head element (like most bookmarklets do).

I think the javascript being interpreted as UTF-8 code should then get/manipulate UTF-8 strings.

Then, in your Java Servlet, you can specify the input encoding to use:

request.setCharacterEncoding("UTF-8");

Edit: check this page about Character Encoding in JavaScript, especially the part named "Setting the Character Encoding".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文