urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)
This is similar to urlparse(), but does not split the params from the URL. This should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted.
以及特殊语法;params 被视为不透明部分可能特定于 HTTP(S) 方案或只是某些特定实现的 URI 语法:
除了分层路径中的点段之外,通用语法认为路径段是不透明的。 URI 生成应用程序通常使用段中允许的保留字符来分隔特定于方案或特定于解引用处理程序的子组件。例如,分号(“;”)和等号(“=”)保留字符通常用于分隔适用于该段的参数和参数值。逗号(“,”)保留字符通常用于类似目的。例如,一个 URI 生产者可能使用诸如“name;v=1.1”之类的段来指示对“name”版本 1.1 的引用,而另一个 URI 生产者可能使用诸如“name,1.1”之类的段来指示相同的内容。 参数类型可以通过特定于方案的语义来定义,但在大多数情况下参数的语法特定于 URI 解除引用算法的实现。
Given the documentation you linked didn't include an example with an nonempty params I was also confused until I found this.
I'd never heard of the URL "parameters" other than url component params i.e. /user/213/settings or query params /user?id=213 and I think it's essentially obsolete.
In the beginning, RFC 1738 defined the HTTP URL to never allow ; in the path:
http://<host>:<port>/<path>?<searchpart>
Within the <path> and <searchpart> components, "/", ";", "?" are reserved.
; was reserved with special meaning in other schemes, like the ftp:// url-path:
<cwd1>/<cwd2>/.../<cwdN>/<name>;type=<typecode>
Apparently in 1995, RFC 1808 defined URL params as a top-level component between path and query:
And the special syntax of ;params is considered an opaque part of the URI syntax that may be specific to the HTTP(S) scheme or just some specific implementation:
Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents. For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same. Parameter types may be defined by scheme-specific semantics, but in most cases the syntax of a parameter is specific to the implementation of the URI's dereferencing algorithm.
FYI: According to [RFC2396](https://www.rfc-editor.org/rfc/rfc2396.html#appendix-C), _parameter_ in URL specification
> Extensive testing of current client applications demonstrated that
the majority of deployed systems do not use the ";" character to
indicate trailing parameter information, and that the presence of a
semicolon in a path segment does not affect the relative parsing of
that segment. Therefore, parameters have been removed as a separate
component and may now appear in any path segment. Their influence
has been removed from the algorithm for resolving a relative URI
reference.
As the document says urlparse.urlparse returns 6-tuple(with additional parameter tuple) urlparse.urlsplit returns 5-tuple
Attribute |Index | Value | Value if not present
params | 3 | Parameters for last path element | empty string
FYI: According to [RFC2396](https://www.rfc-editor.org/rfc/rfc2396.html#appendix-C), _parameter_ in URL specification
> Extensive testing of current client applications demonstrated that
the majority of deployed systems do not use the ";" character to
indicate trailing parameter information, and that the presence of a
semicolon in a path segment does not affect the relative parsing of
that segment. Therefore, parameters have been removed as a separate
component and may now appear in any path segment. Their influence
has been removed from the algorithm for resolving a relative URI
reference.
发布评论
评论(3)
直接来自您自己链接的文档:
Directly from the docs you linked yourself:
鉴于您链接的文档不包含带有非空
params
的示例,我也很困惑,直到我发现 this 。(一些历史,因为我被书呆子狙击了。)
除了 url 组件参数(即
/user/213/settings
或查询)之外,我从未听说过 URL“参数” params/user?id=213
我认为它基本上已经过时了。一开始,RFC 1738 定义 HTTP URL 永远不允许
;
在路径
中:;
在其他方案中保留有特殊含义,就像 ftp://url-path
一样:显然是在 1995 年,RFC 1808 定义 URL
params
作为path
和query
之间的顶级组件:然后在1998年, RFC 2396 定义 URI 具有相邻的顶级组件
路径
和查询
:其中
路径
是定义为多个path_segments
,每个路径可以包含param
:最终在 2005 年,RFC 3986 废弃了 RFC 1808 和 2396,定义
URI
类似于RFC 2396:以及特殊语法
;params
被视为不透明部分可能特定于 HTTP(S) 方案或只是某些特定实现的 URI 语法:Given the documentation you linked didn't include an example with an nonempty
params
I was also confused until I found this.(Some history because I got nerd-sniped.)
I'd never heard of the URL "parameters" other than url component params i.e.
/user/213/settings
or query params/user?id=213
and I think it's essentially obsolete.In the beginning, RFC 1738 defined the HTTP URL to never allow
;
in thepath
:;
was reserved with special meaning in other schemes, like the ftp://url-path
:Apparently in 1995, RFC 1808 defined URL
params
as a top-level component betweenpath
andquery
:Then in 1998, RFC 2396 defined URIs as having adjacent top-level components
path
andquery
:where the
path
is defined as multiplepath_segments
that each could includeparam
:Finally in 2005, RFC 3986 obsoleted RFC 1808 and 2396, defining
URI
similarly to RFC 2396:And the special syntax of
;params
is considered an opaque part of the URI syntax that may be specific to the HTTP(S) scheme or just some specific implementation:正如文档所述
urlparse.urlparse
返回 6 元组(带有附加参数元组)urlparse.urlsplit
返回 5 元组FYI: According to [RFC2396](https://www.rfc-editor.org/rfc/rfc2396.html#appendix-C), _parameter_ in URL specification
> Extensive testing of current client applications demonstrated that
the majority of deployed systems do not use the ";" character to
indicate trailing parameter information, and that the presence of a
semicolon in a path segment does not affect the relative parsing of
that segment. Therefore, parameters have been removed as a separate
component and may now appear in any path segment. Their influence
has been removed from the algorithm for resolving a relative URI
reference.
As the document says
urlparse.urlparse
returns 6-tuple(with additional parameter tuple)urlparse.urlsplit
returns 5-tupleFYI: According to [RFC2396](https://www.rfc-editor.org/rfc/rfc2396.html#appendix-C), _parameter_ in URL specification
> Extensive testing of current client applications demonstrated that
the majority of deployed systems do not use the ";" character to
indicate trailing parameter information, and that the presence of a
semicolon in a path segment does not affect the relative parsing of
that segment. Therefore, parameters have been removed as a separate
component and may now appear in any path segment. Their influence
has been removed from the algorithm for resolving a relative URI
reference.