URL 中允许使用方括号吗?

发布于 2024-07-05 22:26:21 字数 328 浏览 15 评论 0原文

URL 中允许使用方括号吗?

我注意到 Apache commons HttpClient (3.0.1) 抛出 IOException,然而 wget 和 Firefox 接受方括号。

URL 示例:

http://example.com/path/to/file[3].html

我的 HTTP 客户端遇到此类 URL,但我不确定是否要修补代码或引发异常(实际上应该如此)。

Are square brackets in URLs allowed?

I noticed that Apache commons HttpClient (3.0.1) throws an IOException, wget and Firefox however accept square brackets.

URL example:

http://example.com/path/to/file[3].html

My HTTP client encounters such URLs but I'm not sure whether to patch the code or to throw an exception (as it actually should be).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

贪了杯 2024-07-12 22:26:22

URL 中的方括号 [] 通常不受支持。

将它们替换为 %5B%5D

  • 使用命令行,以下示例基于 bashsed :

    url='http://example.com?day=[0-3][0-9]' 
      编码的_url =“$(sed's / \ [/%5B / g; s /] /%5D / g'<<<“$ url”)” 
      

  • 使用 Java URLEncoder.encode(String s, String enc)

  • 使用 PHP rawurlencode( )urlencode ()

    <前><代码>';
    ?>

    输出:

     
      

    或者:

    <前><代码>';
    ?>

  • 使用您最喜欢的编程语言...请通过发表评论或直接编辑此答案来扩展此答案,以添加您在编程语言中使用的功能; -)

有关更多详细信息,请参阅 RFC 3986 指定 URL 语法。 附录A是关于查询字符串中的%-encoding(属于“gen-delims”的括号为% -编码)。

Square brackets [ and ] in URLs are not often supported.

Replace them by %5B and %5D:

  • Using a command line, the following example is based on bash and sed:

    url='http://example.com?day=[0-3][0-9]'
    encoded_url="$( sed 's/\[/%5B/g;s/]/%5D/g' <<< "$url")"
    
  • Using Java URLEncoder.encode(String s, String enc)

  • Using PHP rawurlencode() or urlencode()

    <?php
    echo '<a href="http://example.com/day/',
        rawurlencode('[0-3][0-9]'), '">';
    ?>
    

    output:

    <a href="http://example.com/day/%5B0-3%5D%5B0-9%5D">
    

    or:

    <?php
    $query_string = 'day=' . urlencode('[0-3][0-9]') .
                    '&month=' . urlencode('[0-1][0-9]');
    echo '<a href="http://example.com?',
          htmlentities($query_string), '">';
    ?>
    
  • Using your favorite programming language... Please extend this answer by posting a comment or editing directly this answer to add the function you use from your programming language ;-)

For more details, see the RFC 3986 specifying the URL syntax. The Appendix A is about %-encoding in the query string (brackets as belonging to “gen-delims” to be %-encoded).

菊凝晚露 2024-07-12 22:26:22

路径名中唯一不允许使用的字符几乎是 # 和 ? 因为它们意味着路径的终点。

uri rfc 将有最终答案:

http://www.ietf.org/rfc/rfc1738。文本

不安全:

出于多种原因,角色可能不安全。 空间
字符是不安全的,因为重要的空格可能会消失并且
转录 URL 时可能会引入无关紧要的空格,或者
排版或经过文字处理程序处理。
字符“<” 和“>” 不安全,因为它们被用作
自由文本中 URL 周围的分隔符; 引号(“””)用于
在某些系统中分隔 URL。 字符“#”是不安全的,应该
总是被编码,因为它被用于万维网和其他
系统将 URL 与片段/锚点标识符分隔开来,该标识符可能
跟着它。 字符“%”不安全,因为它用于
其他字符的编码。 其他字符不安全,因为
众所周知,网关和其他传输代理有时会修改
这样的人物。 这些字符是“{”、“}”、“|”、“\”、“^”、“~”、
“[”、“]”和“`”。

所有不安全字符必须始终在 URL 中进行编码。 为了
例如,字符“#”必须在 URL 中进行编码,即使在
通常不处理片段或锚点的系统
标识符,这样如果 URL 被复制到另一个系统中
使用它们时,无需更改 URL 编码。

答案是它们应该是十六进制编码的,但是知道Postel定律,大多数东西都会逐字接受它们。

Pretty much the only characters not allowed in pathnames are # and ? as they signify the end of the path.

The uri rfc will have the definative answer:

http://www.ietf.org/rfc/rfc1738.txt

Unsafe:

Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (""") is used to
delimit URLs in some systems. The character "#" is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it. The character "%" is unsafe because it is used for
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes modify
such characters. These characters are "{", "}", "|", "\", "^", "~",
"[", "]", and "`".

All unsafe characters must always be encoded within a URL. For
example, the character "#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding.

The answer is that they should be hex encoded, but knowing postel's law, most things will accept them verbatim.

离线来电— 2024-07-12 22:26:22

我知道这个问题有点老了,但我只是想指出 PHP 使用括号在 URL 中传递数组。

http://www.example.com/foo.php?bar[]=1&bar[]=2&bar[]=3

在这种情况下,$_GET['bar'] 将包含 array(1, 2, 3)

I know this question is a bit old, but I just wanted to note that PHP uses brackets to pass arrays in a URL.

http://www.example.com/foo.php?bar[]=1&bar[]=2&bar[]=3

In this case $_GET['bar'] will contain array(1, 2, 3).

枉心 2024-07-12 22:26:22

StackOverflow 似乎没有对它们进行编码:

https://stackoverflow.com/search?q=square+brackets+[url]

StackOverflow seems to not encode them:

https://stackoverflow.com/search?q=square+brackets+[url]

╰沐子 2024-07-12 22:26:22

要使用 HttpClient commons 类,您需要查看 org.apache.commons.httpclient.util.URIUtil 类,特别是encode() 方法。 在尝试获取 URL 之前,使用它对 URL 进行 URI 编码。

For using the HttpClient commons class, you want to look into the org.apache.commons.httpclient.util.URIUtil class, specifically the encode() method. Use it to URI-encode the URL before trying to fetch it.

难以启齿的温柔 2024-07-12 22:26:22

任何接受 URL 并且在引入特殊字符时不抛出异常的浏览器或支持 Web 的软件几乎可以保证在幕后对特殊字符进行编码。 大括号、方括号、空格等都有特殊的编码方式来表示,以免产生冲突。 根据前面的答案,处理这些问题的最安全方法是在将它们交给尝试解析 URL 的东西之前对它们进行 URL 编码。

Any browser or web-enabled software that accepts URLs and is not throwing an exception when special characters are introduced is almost guaranteed to be encoding the special characters behind the scenes. Curly brackets, square brackets, spaces, etc all have special encoded ways of representing them so as not to produce conflicts. As per the previous answers, the safest way to deal with these is to URL-encode them before handing them off to something that will try to resolve the URL.

这样的小城市 2024-07-12 22:26:22

最好对它们进行 URL 编码,因为显然并非所有 Web 服务器都支持它们。 有时,即使有标准,也不是每个人都遵循它。

Best to URL encode those, as they are clearly not supported in all web servers. Sometimes, even when there is a standard, not everyone follows it.

离不开的别离 2024-07-12 22:26:22

根据 URL 规范,方括号不是有效的 URL 字符。

这是相关的片段:

“国家”和“标点符号”字符不会出现在任何
产品,因此可能不会出现在 URL 中。
国家{| } | 线 | [ | ] | \ | ^ | ~
标点符号 < | >

According to the URL specification, the square brackets are not valid URL characters.

Here's the relevant snippets:

The "national" and "punctuation" characters do not appear in any
productions and therefore may not appear in URLs.
national { | } | vline | [ | ] | \ | ^ | ~
punctuation < | >

没有伤那来痛 2024-07-12 22:26:22

方括号被认为是不安全的,但大多数浏览器都会正确解析它们。 话虽如此,最好用一些其他字符替换方括号。

Square brackets are considered unsafe, but majority of browsers will parse those correctly. Having said that it is better to replace square brackets with some other characters.

盛夏尉蓝 2024-07-12 22:26:21

RFC 3986 规定

由互联网识别的主机
协议文字地址,版本 6
[RFC3513]或更高版本,区分
通过将 IP 文字括起来
方括号(“[”和“]”)。 这
是唯一一个方括号的地方
URI 中允许使用字符
语法。

因此理论上您不应该在野外看到这样的 URI,因为它们应该是经过编码的。

RFC 3986 states

A host identified by an Internet
Protocol literal address, version 6
[RFC3513] or later, is distinguished
by enclosing the IP literal within
square brackets ("[" and "]"). This
is the only place where square bracket
characters are allowed in the URI
syntax.

So you should not be seeing such URI's in the wild in theory, as they should arrive encoded.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文