使用“javascript:”伪协议对 URL 进行编码规则?

发布于 2024-09-10 06:04:16 字数 938 浏览 4 评论 0原文

有没有关于伪协议javascript:的URL语法和编码的权威参考? (我知道这没有得到很好的考虑,但无论如何它对于书签很有用)。

首先,我们知道标准 URL 遵循语法:

scheme://username:password@domain:port/path?query_string#anchor

但这种格式似乎不适用于此处。事实上,似乎用 URI 而不是 URL 更正确:此处列出了“非官方”格式javascript:{body}

那么,现在嵌入 HTML 时,此类 URI 的有效字符(转义/转义规则是什么)是哪些?

具体来说,如果我有 javascript 函数的代码,并且想将其嵌入到 javascript: URI 中,要应用哪些转义规则

当然,可以转义每个非字母数字字符,但这会导致代码不可读。我只想转义必要的字符。

此外,很明显,使用一些 urlencode/urldecode 例程对(这些用于查询字符串值),我们不想例如,将“+”解码为空格。

Is there any authoritative reference about the syntax and encoding of an URL for the pseudo-protocol javascript:? (I know it's not very well considered, but anyway it's useful for bookmarklets).

First, we know that standard URLs follow the syntax:

scheme://username:password@domain:port/path?query_string#anchor

but this format doesn't seem to apply here. Indeed, it seems, it would be more correct to speak of URI instead of URL : here is listed the "unofficial" format javascript:{body}.

Now, then, which are the valid characters for such a URI, (what are the escape/unescape rules) when embedding in a HTML?

Specifically, if I have the code of a javascript function and I want to embed it in a javascript: URI, which are the escape rules to apply?

Of course one could escape every non alfanumeric character, but that would be overkill and make the code unreadable. I want to escape only the necessary characters.

Further, it's clear that it would be bad to use some urlencode/urldecode routine pair (those are for query string values), we don't want to decode '+' to spaces, for example.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

世界等同你 2024-09-17 06:04:16

到目前为止我的发现:

首先,有编写有效 HTML 属性值的规则:但这里标准仅需要(如果属性值用引号引起来)任意 CDATA(实际上是 %URI,但 HTML 本身不会在其级别强加额外的验证:任何 CDATA 都会验证)。

一些例子:

 <a href="javascript:alert('Hi!')">     (1)
 <a href="javascript:if(a > b && 1 < 0) alert(  b ? 'hi' : 'bye')">   (2)
 <a href="javascript:if(a>b &&& 1 < 0) alert( b ? 'hi' : 'bye')">  (3)

例子(1)是有效的。而且示例 (2) 也是有效的 HTML 4.01 Strict。为了使其有效的 XHTML,我们只需要转义 XML 特殊字符 < > & (示例 3 是有效的 XHTML 1.0 Strict)。

现在,示例 (2) 是有效的 javascript: URI 吗?我不确定,但我想说不是。

来自 RFC 2396:URI 受到一些附加限制,特别是转义/unescape 通过 %xx 序列。有些字符始终是被禁止的:
其中空格和 {}#

RFC 还定义了不透明 URI 的子集:这些 URI 没有分层组件,并且分隔字符没有特殊含义(例如,它们没有“查询字符串”,因此? 可以用作任何非特殊字符)。我认为 javascript: URI 应该被考虑在其中。

这意味着 javascript: URI 的“正文”内的有效字符受到

 a-zA-Z0-9 
 _|. !~*'();?:@&=+$,/-   
 %hh : (escape sequence, with two hexadecimal digits)

附加限制,即不能以 / 开头。
一些“重要”的 ASCII 字符,例如

{}#[]<>^\

Also % (因为它用于转义序列)、双引号 " 和(最重要的)所有空格。

该剧照遗漏了 在某些方面,这似乎相当宽松:重要的是要注意 + 是有效的(因此在解码时不应该将其“转义”,作为空格),

但在其他方面,它似乎限制太多。 . 大括号和方括号,特别是:我知道它们通常不转义,浏览器也没有问题。

RFC 不允许使用大括号,但我认为这种 URI 没有问题。看到在大多数小书签中,它们被转义为“%20”。对此有什么(经验或理论)解释吗?

我仍然不知道是否有一些标准函数可以进行这种转义/转义(在主流语言中)或一些示例代码。

My findings, so far:

First, there are the rules for writing a valid HTML attribute value: but here the standard only requires (if the attribute value if enclosed in quotes) an arbitrary CDATA (actually a %URI, but HTML itself does not impose additional validation at its level: any CDATA will validate).

Some examples:

 <a href="javascript:alert('Hi!')">     (1)
 <a href="javascript:if(a > b && 1 < 0) alert(  b ? 'hi' : 'bye')">   (2)
 <a href="javascript:if(a>b &&& 1 < 0) alert( b ? 'hi' : 'bye')">  (3)

Example (1) is valid. But also example (2) is valid HTML 4.01 Strict. To make it valid XHTML we only need to escape the XML special characters < > & (example 3 is valid XHTML 1.0 Strict).

Now, is example (2) a valid javascript: URI ? I'm not sure, but I'd say it's not.

From RFC 2396: an URI is subject to some addition restrictions and, in particular, the escape/unescape via %xx sequences. And some characters are always prohibited:
among them spaces and {}# .

The RFC also defines a subset of opaque URIs: those that do not have hierarchical components, and for which the separating charactes have no special meaning (for example, they dont have a 'query string', so the ? can be used as any non special character). I assume javascript: URIs should be considered among them.

This would imply that the valid characters inside the 'body' of a javascript: URI are

 a-zA-Z0-9 
 _|. !~*'();?:@&=+$,/-   
 %hh : (escape sequence, with two hexadecimal digits)

with the additional restriction that it can't begin with /.
This stills leaves out some "important" ASCII characters, for example

{}#[]<>^\

Also % (because it's used for escape sequences), double quotes " and (most important) all blanks.

In some respects, this seems quite permissive: it's important to note that + is valid (and hence it should not be 'unescaped' when decoding, as a space).

But in other respects, it seems too restrictive. Braces and brackets, specially: I understand that they are normally used unescaped and browsers have no problems.

And what about spaces? As braces, they are disallowed by the RFC, but I see no problem in this kind of URI. However, I see that in most bookmarklets they are escaped as "%20". Is there any (empirical or theorical) explanation for this?

I still don't know if there are some standard functions to make this escape/unescape (in mainstream languages) or some sample code.

海夕 2024-09-17 06:04:16

javascript: URLs are currently part of the HTML spec and are specified at https://html.spec.whatwg.org/multipage/browsing-the-web.html#the-javascript:-url-special-case

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文