使用“javascript:”伪协议对 URL 进行编码规则?
有没有关于伪协议javascript:
的URL语法和编码的权威参考? (我知道这没有得到很好的考虑,但无论如何它对于书签很有用)。
首先,我们知道标准 URL 遵循语法:
scheme://username:password@domain:port/path?query_string#anchor
但这种格式似乎不适用于此处。事实上,似乎用 URI 而不是 URL 更正确:此处列出了“非官方”格式javascript:{body}
。
那么,现在嵌入 HTML 时,此类 URI 的有效字符(转义/转义规则是什么)是哪些?
具体来说,如果我有 javascript 函数的代码,并且想将其嵌入到 javascript:
URI 中,要应用哪些转义规则?
当然,可以转义每个非字母数字字符,但这会导致代码不可读。我只想转义必要的字符。
此外,很明显,使用一些 urlencode/urldecode 例程对(这些用于查询字符串值),我们不想例如,将“+”解码为空格。
Is there any authoritative reference about the syntax and encoding of an URL for the pseudo-protocol javascript:
? (I know it's not very well considered, but anyway it's useful for bookmarklets).
First, we know that standard URLs follow the syntax:
scheme://username:password@domain:port/path?query_string#anchor
but this format doesn't seem to apply here. Indeed, it seems, it would be more correct to speak of URI instead of URL : here is listed the "unofficial" format javascript:{body}
.
Now, then, which are the valid characters for such a URI, (what are the escape/unescape rules) when embedding in a HTML?
Specifically, if I have the code of a javascript function and I want to embed it in a javascript:
URI, which are the escape rules to apply?
Of course one could escape every non alfanumeric character, but that would be overkill and make the code unreadable. I want to escape only the necessary characters.
Further, it's clear that it would be bad to use some urlencode/urldecode routine pair (those are for query string values), we don't want to decode '+' to spaces, for example.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
到目前为止我的发现:
首先,有编写有效 HTML 属性值的规则:但这里标准仅需要(如果属性值用引号引起来)任意 CDATA(实际上是 %URI,但 HTML 本身不会在其级别强加额外的验证:任何 CDATA 都会验证)。
一些例子:
例子(1)是有效的。而且示例 (2) 也是有效的 HTML 4.01 Strict。为了使其有效的 XHTML,我们只需要转义 XML 特殊字符
< > &
(示例 3 是有效的 XHTML 1.0 Strict)。现在,示例 (2) 是有效的
javascript:
URI 吗?我不确定,但我想说不是。来自 RFC 2396:URI 受到一些附加限制,特别是转义/unescape 通过
%xx
序列。有些字符始终是被禁止的:其中空格和
{}#
。RFC 还定义了不透明 URI 的子集:这些 URI 没有分层组件,并且分隔字符没有特殊含义(例如,它们没有“查询字符串”,因此
?
可以用作任何非特殊字符)。我认为javascript:
URI 应该被考虑在其中。这意味着
javascript:
URI 的“正文”内的有效字符受到附加限制,即不能以
/
开头。一些“重要”的 ASCII 字符,例如
Also
%
(因为它用于转义序列)、双引号"
和(最重要的)所有空格。该剧照遗漏了 在某些方面,这似乎相当宽松:重要的是要注意
+
是有效的(因此在解码时不应该将其“转义”,作为空格),但在其他方面,它似乎限制太多。 . 大括号和方括号,特别是:我知道它们通常不转义,浏览器也没有问题。
RFC 不允许使用大括号,但我认为这种 URI 没有问题。看到在大多数小书签中,它们被转义为“%20”。对此有什么(经验或理论)解释吗?
我仍然不知道是否有一些标准函数可以进行这种转义/转义(在主流语言中)或一些示例代码。
My findings, so far:
First, there are the rules for writing a valid HTML attribute value: but here the standard only requires (if the attribute value if enclosed in quotes) an arbitrary CDATA (actually a %URI, but HTML itself does not impose additional validation at its level: any CDATA will validate).
Some examples:
Example (1) is valid. But also example (2) is valid HTML 4.01 Strict. To make it valid XHTML we only need to escape the XML special characters
< > &
(example 3 is valid XHTML 1.0 Strict).Now, is example (2) a valid
javascript:
URI ? I'm not sure, but I'd say it's not.From RFC 2396: an URI is subject to some addition restrictions and, in particular, the escape/unescape via
%xx
sequences. And some characters are always prohibited:among them spaces and
{}#
.The RFC also defines a subset of
opaque URIs
: those that do not have hierarchical components, and for which the separating charactes have no special meaning (for example, they dont have a 'query string', so the?
can be used as any non special character). I assumejavascript:
URIs should be considered among them.This would imply that the valid characters inside the 'body' of a
javascript:
URI arewith the additional restriction that it can't begin with
/
.This stills leaves out some "important" ASCII characters, for example
Also
%
(because it's used for escape sequences), double quotes"
and (most important) all blanks.In some respects, this seems quite permissive: it's important to note that
+
is valid (and hence it should not be 'unescaped' when decoding, as a space).But in other respects, it seems too restrictive. Braces and brackets, specially: I understand that they are normally used unescaped and browsers have no problems.
And what about spaces? As braces, they are disallowed by the RFC, but I see no problem in this kind of URI. However, I see that in most bookmarklets they are escaped as "%20". Is there any (empirical or theorical) explanation for this?
I still don't know if there are some standard functions to make this escape/unescape (in mainstream languages) or some sample code.
javascript:
URL 目前是 HTML 规范的一部分,并在 https://html.spec.whatwg.org/multipage/browsing-the-web.html#the-javascript:-url-special-casejavascript:
URLs are currently part of the HTML spec and are specified at https://html.spec.whatwg.org/multipage/browsing-the-web.html#the-javascript:-url-special-case