冒号“:”对于友好 URL 使用安全吗?

发布于 2024-08-18 08:36:55 字数 601 浏览 6 评论 0 原文

我们正在设计一个 URL 系统,它将应用程序部分指定为由斜杠分隔的单词。具体来说,这是在 GWT 中,因此 URL 的相关部分将在哈希中(将由客户端的控制器层解释):

http://site/gwturl#section1/section2

某些部分可能需要额外的属性,我们希望用a :,以便 URL 的部分部分明确。代码将首先在 / 上拆分,然后在 : 上拆分,如下所示:

http://site/gwturl#user:45/comments

当然,我们这样做是为了 url 友好性,因此我们希望确保这些具有特殊含义的字符都不会被浏览器或任何其他系统进行 url 编码,并最终得到如下所示的 url:

http://site/gwturl#user%3A45/comments <--- BAD

Is using the colon in this way safe (通过这种方式,我对于浏览器、书签系统、甚至 Javascript 或 Java 代码来说,意味着不会自动编码?

We are designing a URL system that will specify application sections as words separated by slashes. Specifically, this is in GWT, so the relevant parts of the URL will be in the hash (which will be interpreted by a controller layer on the client-side):

http://site/gwturl#section1/section2

Some sections may need additional attributes, which we'd like to specify with a :, so that the section parts of the URL are unambiguous. The code would split first on /, then on :, like this:

http://site/gwturl#user:45/comments

Of course, we are doing this for url-friendliness, so we'd like to make sure that none of these characters which will hold special meaning will be url-encoded by browsers, or any other system, and end up with a url like this:

http://site/gwturl#user%3A45/comments <--- BAD

Is using the colon in this way safe (by which I mean won't be automatically encoded) for browsers, bookmarking systems, even Javascript or Java code?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

白况 2024-08-25 08:36:55

我最近写了一个网址编码器,所以这对我来说很新鲜。

http://site/gwturl#user:45/comments

片段部分 (user:45/comments) 对于RFC 3986 URI。

ABNF 的相关部分:

fragment      = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

除了这些限制之外,片段部分除了您的应用程序提供的结构之外,没有任何定义的结构。 http 方案仅表示您不将这部分发送到服务器。


编辑:

天啊!

尽管我对 URI 规范有断言,irreputable他指出 HTML 4 规范限制元素名称/标识符

请注意,标识符规则在 HTML 5 中发生了变化。 URI 限制仍然适用(在撰写本文时,HTML 5 使用 URI 存在一些未解决的问题)。

I recently wrote a URL encoder, so this is pretty fresh in my mind.

http://site/gwturl#user:45/comments

All the characters in the fragment part (user:45/comments) are perfectly legal for RFC 3986 URIs.

The relevant parts of the ABNF:

fragment      = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "
quot; / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

Apart from these restrictions, the fragment part has no defined structure beyond the one your application gives it. The scheme, http, only says that you don't send this part to the server.


EDIT:

D'oh!

Despite my assertions about the URI spec, irreputable provides the correct answer when he points out that the HTML 4 spec restricts element names/identifiers.

Note that identifier rules are changing in HTML 5. URI restrictions will still apply (at time of writing, there are some unresolved issues around HTML 5's use of URIs).

鱼窥荷 2024-08-25 08:36:55

MediaWiki 和其他 wiki 引擎在其 URL 中使用冒号来指定名称空间,显然没有什么大问题。

例如 http://en.wikipedia.org/wiki/Template:Welcome

MediaWiki and other wiki engines use colons in their URLs to designate namespaces, with apparently no major problems.

eg http://en.wikipedia.org/wiki/Template:Welcome

瞄了个咪的 2024-08-25 08:36:55

除了 McDowell 对 URI 标准的分析之外,还请记住片段必须是有效的 HTML 锚点名称。根据 http://www.w3.org/TR/html4/types .html#类型名称

ID 和 NAME 令牌必须以
字母 ([A-Za-z]) 并可能跟随
由任意数量的字母、数字组成
([0-9])、连字符 (“-”)、下划线
(“_”)、冒号(“:”)和句点
(“。”)。

所以你很幸运。 “:”是明确允许的。没有人应该“%”转义它,不仅因为“%”在那里是非法字符,而且因为片段必须逐个字符匹配锚点名称,因此任何代理都不应尝试以任何方式篡改它们。

然而你必须测试它。 Web 标准没有得到严格遵循,有时标准会发生冲突。例如,HTTP/1.1 RFC 2616 不允许在请求 URL 中包含查询字符串,而 HTML 在使用 GET 方法提交表单时构造一个查询字符串。无论哪一个在现实世界中实施,最终都会获胜。

In addition to McDowell's analysis on URI standard, remember also that the fragment must be valid HTML anchor name. According to http://www.w3.org/TR/html4/types.html#type-name

ID and NAME tokens must begin with a
letter ([A-Za-z]) and may be followed
by any number of letters, digits
([0-9]), hyphens ("-"), underscores
("_"), colons (":"), and periods
(".").

So you are in luck. ":" is explicitly allowed. And nobody should "%"-escape it, not only because "%" is illegal char there, but also because fragment must match anchor name char-by-char, therefore no agent should try to tamper with them in any way.

However you have to test it. Web standards are not strictly followed, sometimes the standards are conflicting. For example HTTP/1.1 RFC 2616 does not allow query string in the request URL, while HTML constructs one when submitting a form with GET method. Whichever implemented in the real world wins at the end of the day.

冰火雁神 2024-08-25 08:36:55

我不会指望它。许多用户代理可能会将 url 编码为 %3A

I wouldn't count on it. It'll likely get url encoded as %3A by many user-agents.

べ映画 2024-08-25 08:36:55

谷歌也使用冒号。

此规范中,他们使用冒号作为自定义方法名称。

Google also uses colons.

In this specification, they use colons for the custom method names.

情愿 2024-08-25 08:36:55

来自 URLEncoder javadoc:

有关 HTML 表单的更多信息
编码,请参阅 HTML
规范

对字符串进行编码时,如下
适用规则:

  • 字母数字字符“a”
    到“z”、“A”到“Z”和“0”
    到“9”保持不变。

  • 特殊字符“.”、“-”、“*”和
    “_”保持不变。
  • 空间
    字符“ ”转换为加号
    符号“+”。
  • 所有其他字符都是
    不安全并首先转换为
    使用某种编码的一个或多个字节
    方案。那么每个字节都表示
    由 3 个字符的字符串“%xy”组成,其中
    xy 是两位十六进制数
    字节的表示。这
    推荐使用的编码方案是
    UTF-8。不过为了兼容
    原因,如果编码不是
    指定,则默认编码
    使用该平台的。

也就是说, : 并不安全。

From URLEncoder javadoc:

For more information about HTML form
encoding, consult the HTML
specification.

When encoding a String, the following
rules apply:

  • The alphanumeric characters "a"
    through "z", "A" through "Z" and "0"
    through "9" remain the same.
  • The
    special characters ".", "-", "*", and
    "_" remain the same.
  • The space
    character " " is converted into a plus
    sign "+".
  • All other characters are
    unsafe and are first converted into
    one or more bytes using some encoding
    scheme. Then each byte is represented
    by the 3-character string "%xy", where
    xy is the two-digit hexadecimal
    representation of the byte. The
    recommended encoding scheme to use is
    UTF-8. However, for compatibility
    reasons, if an encoding is not
    specified, then the default encoding
    of the platform is used.

That is, : is not safe.

生生漫 2024-08-25 08:36:55

我没有看到 Firefox 或 IE8 对某些包含该字符的 Wikipedia URL 进行编码。

I don't see Firefox or IE8 encoding some of the Wikipedia URLs that include the character.

风吹雨成花 2024-08-25 08:36:55

如果协议需要身份验证,则使用冒号作为用户名和密码之间的分隔符。

Colons are used as the split between username and password if a protocol requires authentication.

焚却相思 2024-08-25 08:36:55

Apache URIBuilder 和 JAX-RS UriBuilder 类以不同的方式处理 : (它们也以不同的方式对待花括号)

new URIBuilder("http://localhost").setCustomQuery("foo=a:b&bar={}").buildString()

输出

http://localhost?foo=a:b&bar=%7B%7D
UriBuilder.fromPath("http://localhost").queryParam("foo", "a:b").queryParam("bar", "{}").toTemplate()

输出

http://localhost?foo=a%3Ab&bar={}

所以 Apache URIBuilder 似乎没有编码: 但它对 {} 进行编码,对于 JAX-RS UriBuilder 则相反。

Apache URIBuilder and JAX-RS UriBuilder classes treat : differently (they also treat curly braces different)

new URIBuilder("http://localhost").setCustomQuery("foo=a:b&bar={}").buildString()

outputs

http://localhost?foo=a:b&bar=%7B%7D
UriBuilder.fromPath("http://localhost").queryParam("foo", "a:b").queryParam("bar", "{}").toTemplate()

outputs

http://localhost?foo=a%3Ab&bar={}

So Apache URIBuilder does not seem to encode : but it encodes {} and for JAX-RS UriBuilder it is the other way around.

┈┾☆殇 2024-08-25 08:36:55

结肠并不安全。 参见此处

Colon isn't safe. See here

难以启齿的温柔 2024-08-25 08:36:55

它不是一个安全字符,当它位于您的域名之后时,用于区分您连接到的端口

It is not a safe character and is used to distinguish what port you connect to when it is right after your domain name

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文