apache httpclient 4 的 UNICODE URI 编码

发布于 2024-08-22 01:46:59 字数 1232 浏览 1 评论 0原文

我正在使用 apache http client 4 进行所有 Web 访问。 这意味着我需要执行的每个查询都必须通过 URI 语法检查。 我尝试访问的网站之一使用 UNICODE 作为 url GET params 编码,即:

http://maya.tase.co.il/bursa/index.asp?http://maya.tase.co。 il/bursa/index.asp?view=search&company_group=147&srh_txt=%u05E0%u05D9%u05D1&arg_comp=&srh_from=2009-06-01&srh_until=2010-02-16&srh_anaf=-1& srh_event=9999&is_urgent=0&srh_company_press=

(参数“srh_txt=%u05E0%u05D9%u05D1”以 UNICODE 编码 srh_txt=נйב)

问题是 URI 不支持 UNICODE 编码(仅支持 UTF) -8) 这里真正的大问题是,该网站希望它的参数以 UNICODE 进行编码,因此任何尝试使用 String.format("http://...srh_txt=%s&...",URLEncoder.encode( "נב" , "UTF8")) 生成的 url 是合法的,可用于构造 URI,但站点会用错误消息响应它,因为它不是它期望的编码。

顺便说一下,可以创建 URL 对象,甚至可以使用未转换的 url 连接到网站。 有没有办法以非 UTF-8 编码创建 URI? 有什么方法可以使用常规 URL(而不是 URI)使用 apache httpclient 4 吗?

谢谢, 尼夫

I am working with apache http client 4 for all of my web accesses.
This means that every query that I need to do has to pass the URI syntax checks.
One of the sites that I am trying to access uses UNICODE as the url GET params encoding, i.e:

http://maya.tase.co.il/bursa/index.asp?http://maya.tase.co.il/bursa/index.asp?view=search&company_group=147&srh_txt=%u05E0%u05D9%u05D1&arg_comp=&srh_from=2009-06-01&srh_until=2010-02-16&srh_anaf=-1&srh_event=9999&is_urgent=0&srh_company_press=

(the param "srh_txt=%u05E0%u05D9%u05D1" encodes srh_txt=ניב in UNICODE)

The problem is that URI doesn't support UNICODE encoding(it only supports UTF-8)
The really big issue here, is that this site expect it's params to be encoded in UNICODE, so any attempts to convert the url using String.format("http://...srh_txt=%s&...",URLEncoder.encode( "ניב" , "UTF8"))
results in a url which is legal and can be used to construct a URI but the site response to it with an error message, since it's not the encoding that it expects.

by the way URL object can be created and even used to connect to the web site using the non converted url.
Is there any way of creating URI in non UTF-8 encoding?
Is there any way of working with apache httpclient 4 with regular URL(and not URI)?

thanks,
Niv

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

忘你却要生生世世 2024-08-29 01:46:59

(参数“srh_txt=%u05E0%u05D9%u05D1”以 UNICODE 编码 srh_txt=נйב)

事实并非如此。这不是 URL 编码,并且 URL 中的序列 %u无效

%u05E0%u05D9%u05D1" 仅使用 JavaScript 的奇怪 escape 语法对 נйב 进行编码。escape 与 URL 相同- 对除 + 之外的所有 ASCII 字符进行编码,但它为 Unicode 字符生成的 %u#### 转义完全是它自己的发明

(应该是。一般情况下,切勿使用 escape。使用 encodeURIComponent 会生成正确的 URL 编码 UTF-8,ננב=%D7%A0% D7%99%D7%91。)

如果站点需要其查询字符串中的%u####序列,那么它就会被严重破坏。

有没有办法以非 UTF-8 编码创建 URI?

是的,URI 可以使用您喜欢的任何字符编码。常规上是UTF-8;这就是 IRI 所要求的,并且如果用户在地址栏中键入非 ASCII 字符,浏览器通常会提交什么,但 URI 本身只与字节有关。

因此,您可以将 נйב 转换为 %F0%E9%E1。 Web 应用程序无法判断这些字节代表的是代码页 1255(希伯来语,类似于 ISO-8859-8)中编码的字符。但它似乎确实可以在上面的链接上工作,而 UTF-8 版本却不能。哦亲爱的!

(the param "srh_txt=%u05E0%u05D9%u05D1" encodes srh_txt=ניב in UNICODE)

It doesn't really. That's not URL-encoding and the sequence %u is invalid in a URL.

%u05E0%u05D9%u05D1" encodes ניב only in JavaScript's oddball escape syntax. escape is the same as URL-encoding for all ASCII characters except for +, but the %u#### escapes it produces for Unicode characters are completely of its own invention.

(One should, in general, never use escape. Using encodeURIComponent instead produces the correct URL-encoded UTF-8, ניב=%D7%A0%D7%99%D7%91.)

If a site requires %u#### sequences in its query string, it is very badly broken.

Is there any way of creating URI in non UTF-8 encoding?

Yes, URIs may use any character encoding you like. It is conventionally UTF-8; that's what IRI requires and what browsers will usually submit if the user types non-ASCII characters into the address bar, but URI itself concerns itself only with bytes.

So you could convert ניב to %F0%E9%E1. There would be no way for the web app to tell that those bytes represented characters encoded in code page 1255 (Hebrew, similar to ISO-8859-8). But it does appear to work, on the link above, which the UTF-8 version does not. Oh dear!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文