包含大括号的 URL 的 HttpClient 问题

发布于 2024-11-24 10:24:07 字数 694 浏览 2 评论 0 原文

我正在为我的 Android 应用程序使用 HttpClient。在某些时候,我必须从远程位置获取数据。下面是我如何使用 HttpClient 获取响应的片段。

String url_s = "https://mydomain.com/abc/{5D/{B0blhahblah-blah}I1.jpg"; //my url string
DefaultHttpClient httpClient = new DefaultHttpClient();
response = httpClient.execute(new HttpGet(url_s));

在大多数情况下它工作得很好,但当我的 url 中有一些花括号(基本上是字符串)时就不行了。堆栈跟踪向我显示了大括号的索引,显示“无效字符”。 所以我尝试从编码的 URL 创建 URI。

URL url = new URL(url_s);
URI uri = url.toURI();
response = httpClient.execute(new HttpGet(uri));

这样做之后,我根本没有从远程位置得到结果。我解决了这个问题,并通过将大括号

  • “{”替换为“%7B”
  • “}”替换为“%7D”

来修复它,但我对我的解决方案并不完全满意。还有更好的解决方案吗?有什么像我这样整洁而不是硬编码的吗?

I am using HttpClient for my android application. At some point, I have to fetch data from remote locations. Below is the snippet how I made use of HttpClient to get the response.

String url_s = "https://mydomain.com/abc/{5D/{B0blhahblah-blah}I1.jpg"; //my url string
DefaultHttpClient httpClient = new DefaultHttpClient();
response = httpClient.execute(new HttpGet(url_s));

It works absolutely fine in most cases but not when there is some curly braces in my url which is String basically. The stack trace shows me the index of curly braces saying Invalid character.
So I tried to create URI from encoded URL.

URL url = new URL(url_s);
URI uri = url.toURI();
response = httpClient.execute(new HttpGet(uri));

After doing so, i didn't get the result from remote location at all. I worked around the problem and fixed it by replacing the curly brace

  • "{" with "%7B"
  • "}" with "%7D"

But I am not totally satisfy with my solution. Are there any better solutions? Anything neat and not hard-coded like mine?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

木森分化 2024-12-01 10:24:07

严格的答案是,您的 URL 中不应包含大括号。

有效 URL 的完整说明可以在 RFC1738

该答案的相关部分如下

不安全:

出于多种原因,角色可能不安全。空间
字符不安全,因为重要的空格可能会消失并且
转录 URL 时可能会引入无关紧要的空格,或者
排版或接受文字处理程序处理。
字符“<”和“>”不安全,因为它们被用作
自由文本中 URL 周围的分隔符;引号 (""") 用于
在某些系统中分隔 URL。字符“#”是不安全的,应该
总是被编码,因为它被用于万维网和其他
系统将 URL 与片段/锚点标识符分隔开来,该标识符可能
跟随它。字符“%”不安全,因为它用于
其他字符的编码。其他字符不安全,因为
已知网关和其他传输代理有时会修改
这样的人物。这些字符是“{”、“}”、“|”、“\”、“^”、“~”、
“[”、“]”和“`”。

所有不安全字符必须始终在 URL 中进行编码。对于
例如,即使在
中,字符“#”也必须在 URL 中进行编码
通常不处理片段或锚点的系统
标识符,这样如果 URL 被复制到另一个系统中
如果使用它们,则无需更改 URL 编码。

为了绕过您遇到的问题,您必须对您的网址进行编码。

当整个网址(包括 https://mydomain 进行编码时,您遇到的“主机可能不为空”错误的问题将会发生) .com/ 部分,因此会变得混乱。您只想对 URL 的最后部分(称为路径)进行编码。

解决方案是使用 Uri.Builder 类从各个部分构建 URI,这些部分应在过程中对路径进行编码。

您将在 Android SDK Uri.Builder 参考文档

使用您的值的一些简单示例是:

Uri.Builder b = Uri.parse("https://mydomain.com").buildUpon();
b.path("/abc/{5D/{B0blhahblah-blah}I1.jpg");
Uri u = b.build();

或者您可以使用链接:

    Uri u = Uri.parse("https://mydomain.com").buildUpon().path("/abc/{5D/{B0blhahblah-blah}I1.jpg").build();

The strict answer is that you should never have curly braces in your URL

A full description of valid URL's can be found in RFC1738

The pertinent part for this answer is as follows

Unsafe:

Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (""") is used to
delimit URLs in some systems. The character "#" is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it. The character "%" is unsafe because it is used for
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes modify
such characters. These characters are "{", "}", "|", "\", "^", "~",
"[", "]", and "`".

All unsafe characters must always be encoded within a URL. For
example, the character "#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding.

In order to bypass the problem you have been experiencing you must encode your url.

The problem you experienced with the "host may not be null" error will happen when the entire url is being encoded including the https://mydomain.com/ part so it gets confused. You only want to encode the last part of the URL called the path.

The solution is to use the Uri.Builder class to build your URI from the individual parts which should encode the path in the process

You will find a detailed description in the Android SDK Uri.Builder reference documentation

Some trivial examples using your values are:

Uri.Builder b = Uri.parse("https://mydomain.com").buildUpon();
b.path("/abc/{5D/{B0blhahblah-blah}I1.jpg");
Uri u = b.build();

Or you can use chaining:

    Uri u = Uri.parse("https://mydomain.com").buildUpon().path("/abc/{5D/{B0blhahblah-blah}I1.jpg").build();
终止放荡 2024-12-01 10:24:07

除了 RFC1738 已经过时十多年,已被 rfc3986 取代,并且没有任何指示:

https://www.rfc-editor.org/rfc/rfc3986

大括号是不安全的(在事实上,RFC 在任何地方都不包含单个大括号字符)。此外,我已经在包含花括号的浏览器中尝试过 URI,并且它们工作得很好。

另请注意,OP正在使用一个名为URI的类 - 它绝对应该遵循3986,至少,如果不是3987的话。

然而,奇怪的是,IRIs定义在:

https://www.rfc-editor.org/rfc/rfc3987

有注释:

接受 IRI 的系统也可以处理可打印字符
US-ASCII 中不允许出现在 URI 中的字符,即“<”、“>”、“”、
空格、“{”、“}”、“|”、“”、“^”和“`”,在上面的步骤 2 中。如果这些
找到字符但没有转换,则转换
应该失败。请注意数字符号(“#”)、百分号
符号(“%”)和方括号字符(“[”、“]”)不是组成部分
上述列表中的并且不得转换。

换句话说,RFC 本身似乎存在一些问题。

Except RFC1738 has been obsolete for over a decade, has been superseded by rfc3986 and there is no indication in:

https://www.rfc-editor.org/rfc/rfc3986

That curly braces are unsafe (In fact, the RFC does not contain a single curly brace character anywhere). Furthermore, I've tried URI's in browsers that contain curly braces, and they work fine.

Also note the OP is using a class called URI - which should definitely be following 3986, at the very least, if not 3987.

However, oddly, IRIs defined in:

https://www.rfc-editor.org/rfc/rfc3987

Have the note that:

Systems accepting IRIs MAY also deal with the printable characters
in US-ASCII that are not allowed in URIs, namely "<", ">", '"',
space, "{", "}", "|", "", "^", and "`", in step 2 above. If these
characters are found but are not converted, then the conversion
SHOULD fail. Please note that the number sign ("#"), the percent
sign ("%"), and the square bracket characters ("[", "]") are not part
of the above list and MUST NOT be converted.

In other words, it looks like the RFCs themselves have some issues.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文