URL 中保留的分号有何用途?

发布于 2024-08-19 19:09:06 字数 431 浏览 2 评论 0原文

RFC 3986 URI:通用语法 规范将分号列为保留(子分隔符)字符:

reserved    = gen-delims / sub-delims

gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

“;”的保留用途是什么URI 中的分号?就此而言,其他子分隔符的目的是什么(我只知道“&”、“+”和“=”的目的)?

The RFC 3986 URI: Generic Syntax specification lists a semicolon as a reserved (sub-delim) character:

reserved    = gen-delims / sub-delims

gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims  = "!" / "
quot; / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

What is the reserved purpose of the ";" of the semicolon in URIs? For that matter, what is the purpose of the other sub-delims (I'm only aware of purposes for "&", "+", and "=")?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

音盲 2024-08-26 19:09:06

3.3 节末尾有解释。

除了点段之外
分层路径,路径段是
被通用认为是不透明的
句法。 URI 生成应用程序
经常使用保留字符
允许在段中分隔
特定方案或
解引用处理程序特定的
子组件。例如,
分号(“;”)和等号(“=”)
经常使用保留字符
分隔参数和参数
适用于该细分市场的值。
逗号(“,”)保留字符是
经常用于类似目的。
例如,一个 URI 生产者可能
使用诸如“name;v=1.1”之类的段
表示对版本 1.1 的引用
“名字”,而另一个可能
使用诸如“name,1.1”之类的段
表示相同。参数类型
可以由特定于方案的定义
语义,但在大多数情况下
参数的语法特定于
URI 的实现
解引用算法。

换句话说,它是保留的,以便想要 URL 中某些内容的分隔列表的人可以安全地使用 ; 作为分隔符,即使这些部分包含 ;,只要因为内容是百分比编码的。换句话说,您可以这样做:

foo;bar;baz%3bqux

并将其解释为三个部分:foobarbaz;qux。如果分号不是保留字符,则 ;%3b 将是等效的,因此 URI 将被错误地解释为四个部分:foobarbazqux

There is an explanation at the end of section 3.3.

Aside from dot-segments in
hierarchical paths, a path segment is
considered opaque by the generic
syntax. URI producing applications
often use the reserved characters
allowed in a segment to delimit
scheme-specific or
dereference-handler-specific
subcomponents. For example, the
semicolon (";") and equals ("=")
reserved characters are often used
to delimit parameters and parameter
values applicable to that segment.
The comma (",") reserved character is
often used forsimilar purposes.
For example, one URI producer might
use a segment uch as "name;v=1.1"
to indicate a reference to version 1.1
of "name", whereas another might
use a segment such as "name,1.1" to
indicate the same. Parameter types
may be defined by scheme-specific
semantics, but in most cases the
syntax of a parameter is specific to
the implementation of the URI's
dereferencing algorithm.

In other words, it is reserved so that people who want a delimited list of something in the URL can safely use ; as a delimiter even if the parts contain ;, as long as the contents are percent-encoded. In other words, you can do this:

foo;bar;baz%3bqux

and interpret it as three parts: foo, bar, baz;qux. If semicolon were not a reserved character, the ; and %3bwould be equivalent, so the URI would be incorrectly interpreted as four parts: foo, bar, baz, qux.

三生一梦 2024-08-26 19:09:06

如果您返回到规范的旧版本,其意图会更清晰:

  path_segments = segment *( "/" segment )
  segment       = *pchar *( ";" param ) 

每个路径段可以包括
参数序列,用分号“;”表示性格。

我相信它起源于 FTP URI

The intent is clearer if you go back to older versions of the specification:

  path_segments = segment *( "/" segment )
  segment       = *pchar *( ";" param ) 

Each path segment may include a
sequence of parameters, indicated by the semicolon ";" character.

I believe it has its origins in FTP URIs.

巴黎盛开的樱花 2024-08-26 19:09:06

第 3.3 节介绍了这一点 - 它是一个生成 URI 的应用程序的不透明分隔符方便的话可以使用:

除了点段之外
分层路径,路径段是
被通用认为是不透明的
句法。 URI 生成应用程序
经常使用保留字符
允许在段中分隔
特定方案或
解引用处理程序特定的
子组件。例如,
分号(“;”)和等号(“=”)
保留字符通常用于
分隔参数和参数
适用于该细分市场的值。这
逗号(“,”)保留字符是
经常用于类似的目的。为了
例如,一个 URI 生产者可能会使用
诸如“name;v=1.1”之类的段
表示对 1.1 版本的引用
“名称”,而另一个可能会使用
诸如“name,1.1”之类的段来指示
相同。参数类型可以是
由特定于方案的语义定义,
但在大多数情况下 a 的语法
参数特定于
URI 的实现
解引用算法。

Section 3.3 covers this - it's an opaque delimiter a URI-producing application can use if convenient:

Aside from dot-segments in
hierarchical paths, a path segment is
considered opaque by the generic
syntax. URI producing applications
often use the reserved characters
allowed in a segment to delimit
scheme-specific or
dereference-handler-specific
subcomponents. For example, the
semicolon (";") and equals ("=")
reserved characters are often used to
delimit parameters and parameter
values applicable to that segment. The
comma (",") reserved character is
often used for similar purposes. For
example, one URI producer might use a
segment such as "name;v=1.1" to
indicate a reference to version 1.1 of
"name", whereas another might use a
segment such as "name,1.1" to indicate
the same. Parameter types may be
defined by scheme-specific semantics,
but in most cases the syntax of a
parameter is specific to the
implementation of the URI's
dereferencing algorithm.

探春 2024-08-26 19:09:06

关于其当前用法有一些有趣的约定。这些说明何时使用分号或逗号。来自《RESTful Web Services》一书:

使用标点符号分隔同一层次结构级别的多个数据。当项目的顺序很重要时使用逗号,...当顺序无关紧要时使用分号。

There are some conventions around its current usage that are interesting. These speak to when to use a semicolon or comma. From the book "RESTful Web Services":

Use punctuation characters to separate multiple pieces of data at the same level of hierarchy. Use commas when the order of the items matters, ... Use semicolons when the order doesn't matter.

梦里寻她 2024-08-26 19:09:06

自 2014 年以来,已知路径段有助于 反射文件下载攻击。假设我们有一个易受攻击的 API,它反映了我们发送给它的任何内容:

https://google.com/s?q=rfd%22||calc||

{"results":["q", "rfd\"||calc||","I love rfd"]}

现在,这在浏览器中是无害的,因为它是 JSON,因此不会呈现它,但浏览器宁愿将响应下载为文件。现在,以下路径段可以为攻击者提供帮助:

https://google.com/s;/setup.bat;?q=rfd%22||calc||

分号 (;/setup.bat;) 之间的所有内容都将不会发送到 Web 服务,而是发送到 Web 服务浏览器会将其解释为文件名...以保存 API 响应。

现在,将下载并运行一个名为 setup.bat 的文件,而不会询问运行从 Internet 下载的文件的危险(因为它的名称中包含单词 “setup” )。内容将被解释为 Windows 批处理文件,并且将运行 calc.exe 命令。

预防措施:

  • 清理 API 的输入(在这种情况下,它们应该只允许字母数字);转义是不够的
  • add Content-Disposition: Attachment; filename="whatever.txt" 位于不会呈现的 API 上; Google 缺少 filename 部分,这实际上使攻击更容易
  • X-Content-Type-Options: nosniff 标头添加到 API 响应中

Since 2014, path segments are known to contribute to Reflected File Download attacks. Let's assume we have a vulnerable API that reflects whatever we send to it:

https://google.com/s?q=rfd%22||calc||

{"results":["q", "rfd\"||calc||","I love rfd"]}

Now, this is harmless in a browser as it's JSON, so it's not going to be rendered, but the browser will rather offer to download the response as a file. Now here's the path segments come to help (for the attacker):

https://google.com/s;/setup.bat;?q=rfd%22||calc||

Everything between semicolons (;/setup.bat;) will be not sent to the web service, but instead the browser will interpret it as the file name... to save the API response.

Now, a file called setup.bat will be downloaded and run without asking about dangers of running files downloaded from the Internet (because it contains the word "setup" in its name). The contents will be interpreted as a Windows batch file, and the calc.exe command will be run.

Prevention:

  • sanitize your API's input (in this case, they should just allow alphanumerics); escaping is not sufficient
  • add Content-Disposition: attachment; filename="whatever.txt" on APIs that are not going to be rendered; Google was missing the filename part which actually made the attack easier
  • add X-Content-Type-Options: nosniff header to API responses
浅听莫相离 2024-08-26 19:09:06

我发现了以下用例:

它是 HTML 实体的最终字符:

XML 和 HTML 列表字符实体引用

在 HTML 或 XML 中使用这些字符实体引用之一
文档,输入一个 & 符号,后跟实体名称和
分号,例如,&与符号(“&”)。

Apache Tomcat 7(或更新版本?!)将其用作路径参数

三个分号漏洞

Apache Tomcat 是支持“Path
参数”。路径参数是文件名后面的额外内容,
用分号分隔。分号后的任意内容都可以
不影响网络浏览器的登陆页面。这意味着
http://example.com/index.jsp;derp 仍将返回index.jsp,而不是
一些错误页面。

URI 方案将 MIME 和数据分开:

数据 URI 方案

它可以包含一个可选的字符集参数,与
前面部分加分号 (;) 。

红点

IIS 5 和 IIS 6 中存在绕过文件上传限制的错误:

无限制文件上传

将文件扩展名列入黑名单 此保护可能会被绕过:...
通过在禁止的扩展名后添加分号字符和
在允许的之前(例如“file.asp;.jpg”)

结论:

不要在 URL 中使用分号,否则可能会意外生成 HTML 实体或 URI 方案。

I found the following use cases:

It's the final character of an HTML entity:

List of XML and HTML character entity references

To use one of these character entity references in an HTML or XML
document, enter an ampersand followed by the entity name and a
semicolon, e.g., & for the ampersand ("&").

Apache Tomcat 7 (or newer versions?!) us it as path parameter:

Three Semicolon Vulnerabilities

Apache Tomcat is one example of a web server that supports "Path
Parameters". A path parameter is extra content after a file name,
separated by a semicolon. Any arbitrary content after a semicolon does
not affect the landing page of a web browser. This means that
http://example.com/index.jsp;derp will still return index.jsp, and not
some error page.

URI scheme splits by it the MIME and data:

Data URI scheme

It can contain an optional character set parameter, separated from the
preceding part by a semicolon (;) .

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />

And there was a bug in IIS 5 and IIS 6 to bypass file upload restrictions:

Unrestricted File Upload

Blacklisting File Extensions This protection might be bypassed by: ...
by adding a semi-colon character after the forbidden extension and
before the permitted one (e.g. "file.asp;.jpg")

Conclusion:

Do not use semicolons in URLs or they could accidentally produce an HTML entity or URI scheme.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文