如何从javascript发送具有相同编码的参数?
我有一个 javascript 文件,很多人都将其嵌入到他们的页面中。由于我托管该文件,因此我可以控制该 javascript 文件;我无法控制它的嵌入方式,因为很多人已经在使用它。
该 javascript 文件向我的 servlet 发送 GET 请求,并将随请求传递的参数记录到 DB 中。例如,javascript 向 http://myserver.com/servlet?p1=123&p2=aString
发送请求,然后 servlet 记录 123
和 aString< /code> 以某种方式到数据库。
在发送字符串之前,我使用 encodeURIComponent()
对其进行编码。但我发现每个客户端都会发送具有不同编码的相同字符串,具体取决于他们的浏览器或他们正在访问的网站。因此,相同的字符串在到达 servlet 时会用不同的字符表示(因此它们是不同的字符串)。
我想做的是将字符串从 javascript 转换为一种编码,这样当它们到达客户端时,相同的单词会用相同的字符表示。
这怎么可能?
附言。如果有一种方法可以从 Java 转换编码,那么它也是适用的。
编辑:更准确地说,我从页面中选择一些单词并将其发送到服务器。这就是编码引起问题的地方。
编辑 2: 我不会通过 XMLHttpRequest
发送(也无法发送)GET 请求,因为域不同。我正在使用将 script
标签添加到 @streetpc 提到的 head
方法。
编辑 3: 目前,我正在通过在 javascript 端替换非 ASCII 字符来清理字符串,但我有一种感觉,这不是正确的方法:
function sanitize(word) {
/*
ğ : \u011f
ü : \u00fc
ş : \u015f
ö : \u00f6
ç : \u00e7
ı : \u0131
û : \u00fb
*/
return encodeURIComponent(
word.replace(/\u011f/g, '_g')
.replace(/\u00fc/g, '_u')
.replace(/\u00fb/g, '_u')
.replace(/\u015f/g, '_s')
.replace(/\u00f6/g, '_o')
.replace(/\u00e7/g, '_c')
.replace(/\u0131/g, '_i'));
}
I have a javascript file that lots of people have embedded to their pages. Since I am hosting the file, I have control over that javascript file; I cannot control the way it is embedded because lots of people is using it already.
This javascript file sends GET requests to my servlets, and the parameters passed with the request are recorded to DB. For example, javascript sends a request to http://myserver.com/servlet?p1=123&p2=aString
and then servlet records 123
and aString
to DB somehow.
Before sending strings I use encodeURIComponent()
to encode it. But what I figured out is every client sends the same string with different encodings depending on either their browser or the site they are visiting. As a result, same strings are represented with different characters when it reaches servlet (so they are different strings).
What I am trying to do is to convert the strings to one kind of encoding from javascript so when they reach the client same words are represented with same characters.
How is this possible?
PS. If there is a way to convert the encoding from Java it is also applicable.
Edit: To be more precise, I select some words from the page and send it to the server. That is where encoding causes problems.
Edit 2: I am NOT sending (and can't send) GET requests via XMLHttpRequest
, because domains are different. I am using adding script
tag to head
method that @streetpc mentioned.
Edit 3: At the moment I am sanitizing the strings by replacing non-ASCII characters at javascript side, but I have a feeling that this is not the way to go:
function sanitize(word) {
/*
ğ : \u011f
ü : \u00fc
ş : \u015f
ö : \u00f6
ç : \u00e7
ı : \u0131
û : \u00fb
*/
return encodeURIComponent(
word.replace(/\u011f/g, '_g')
.replace(/\u00fc/g, '_u')
.replace(/\u00fb/g, '_u')
.replace(/\u015f/g, '_s')
.replace(/\u00f6/g, '_o')
.replace(/\u00e7/g, '_c')
.replace(/\u0131/g, '_i'));
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
虽然这对于
如果您在脚本文件本身中使用原始非 ASCII 字符,则可能会出现问题。在这种情况下,这些字符的解释将根据浏览器用于加载脚本的字符集而有所不同。这可能受到以下因素的影响:
Content-Type: text/javascript;charset=
标头中声明的任何字符集。元素上声明的任何
charset
属性。并非所有浏览器都支持 (1) 和 (2)。通常您可以依赖 (3),但作为第三方脚本作者,这是您无法控制的。因此,您应该在脚本中仅使用 ASCII 字符。 (使用
\u1234
转义在脚本的字符串文字中包含非 ASCII 字符,以绕过此限制。)Whilst that would be normal for
<form>
submissions, it should not happen for XMLHttpRequest work. TheencodeURIComponent
function explicitly always writes URL-encoded UTF-8 bytes, regardless of the encoding of the page from which it was used. Of course persuading your servlet container to allow you to read those UTF-8 bytes without messing them up is another story, but that shouldn't depend on the client.What might be a problem is if you are using raw non-ASCII characters inside your script file itself. In that case the interpretation of those characters will vary according to the charset the browser is using to load the script. This may be affected by:
Content-Type: text/javascript;charset=
header.charset
attribute declared on the<script src="..." charset="...">
element.(1) and (2) are not supported in all browsers. Normally you can rely on (3), but as a third-party script author that is out of your control. Therefore you should use only ASCII characters in your script. (Use
\u1234
escapes to include non-ASCII characters in string literals in your script to get around this limitation.)您是否在 HTTP 标头中指定 JavaScript 文件的编码?就像
Content-type: text/javascript; charset=utf-8
当然,.js 文件以 UTF-8 保存。使用 Apache,您可以配置或者让托管的 javascript 文件使用
charset='utf-8'
参数创建另一个script
标记,并将其添加到head
元素(就像大多数书签一样)。我认为被解释为 UTF-8 代码的 JavaScript 应该获取/操作 UTF-8 字符串。
然后,在 Java Servlet 中,您可以指定要使用的输入编码:
编辑:查看此页面有关 JavaScript 中的字符编码,特别是“设置字符编码”部分。
Do you specify the encoding of the JavaScript file in the HTTP headers? Like
Content-type: text/javascript; charset=utf-8
with the .js file beign saved in UTF-8 of course. With Apache, you can configureOr you can make the hosted javascript file create another
script
tag with acharset='utf-8'
parameter and add-it to thehead
element (like most bookmarklets do).I think the javascript being interpreted as UTF-8 code should then get/manipulate UTF-8 strings.
Then, in your Java Servlet, you can specify the input encoding to use:
Edit: check this page about Character Encoding in JavaScript, especially the part named "Setting the Character Encoding".