使用“javascript:”伪协议对 URL 进行编码规则？

发布于 2024-09-10 06:04:16 字数 938 浏览 4 评论 0原文

有没有关于伪协议javascript:的URL语法和编码的权威参考？（我知道这没有得到很好的考虑，但无论如何它对于书签很有用）。

首先，我们知道标准 URL 遵循语法：

scheme://username:password@domain:port/path?query_string#anchor

但这种格式似乎不适用于此处。事实上，似乎用 URI 而不是 URL 更正确：此处列出了“非官方”格式javascript:{body}。

那么，现在嵌入 HTML 时，此类 URI 的有效字符（转义/转义规则是什么）是哪些？

具体来说，如果我有 javascript 函数的代码，并且想将其嵌入到 javascript: URI 中，要应用哪些转义规则？

当然，可以转义每个非字母数字字符，但这会导致代码不可读。我只想转义必要的字符。

此外，很明显，使用一些 urlencode/urldecode 例程对（这些用于查询字符串值），我们不想例如，将“+”解码为空格。

原文

Is there any authoritative reference about the syntax and encoding of an URL for the pseudo-protocol javascript:? (I know it's not very well considered, but anyway it's useful for bookmarklets).

First, we know that standard URLs follow the syntax:

scheme://username:password@domain:port/path?query_string#anchor

but this format doesn't seem to apply here. Indeed, it seems, it would be more correct to speak of URI instead of URL : here is listed the "unofficial" format javascript:{body}.

Now, then, which are the valid characters for such a URI, (what are the escape/unescape rules) when embedding in a HTML?

Specifically, if I have the code of a javascript function and I want to embed it in a javascript: URI, which are the escape rules to apply?

Of course one could escape every non alfanumeric character, but that would be overkill and make the code unreadable. I want to escape only the necessary characters.

Further, it's clear that it would be bad to use some urlencode/urldecode routine pair (those are for query string values), we don't want to decode '+' to spaces, for example.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

世界等同你 2024-09-17 06:04:16

到目前为止我的发现：

首先，有编写有效 HTML 属性值的规则：但这里标准仅需要（如果属性值用引号引起来）任意 CDATA（实际上是 %URI，但 HTML 本身不会在其级别强加额外的验证：任何 CDATA 都会验证）。

一些例子：

 <a href="javascript:alert('Hi!')">     (1)
 <a href="javascript:if(a > b && 1 < 0) alert(  b ? 'hi' : 'bye')">   (2)
 <a href="javascript:if(a>b &&& 1 < 0) alert( b ? 'hi' : 'bye')">  (3)

例子（1）是有效的。而且示例 (2) 也是有效的 HTML 4.01 Strict。为了使其有效的 XHTML，我们只需要转义 XML 特殊字符 < > & （示例 3 是有效的 XHTML 1.0 Strict）。

现在，示例 (2) 是有效的 javascript: URI 吗？我不确定，但我想说不是。

来自 RFC 2396：URI 受到一些附加限制，特别是转义/unescape 通过 %xx 序列。有些字符始终是被禁止的：
其中空格和 {}# 。

RFC 还定义了不透明 URI 的子集：这些 URI 没有分层组件，并且分隔字符没有特殊含义（例如，它们没有“查询字符串”，因此? 可以用作任何非特殊字符）。我认为 javascript: URI 应该被考虑在其中。

这意味着 javascript: URI 的“正文”内的有效字符受到

 a-zA-Z0-9 
 _|. !~*'();?:@&=+$,/-   
 %hh : (escape sequence, with two hexadecimal digits)

附加限制，即不能以 / 开头。
一些“重要”的 ASCII 字符，例如

{}#[]<>^\

Also % （因为它用于转义序列）、双引号 " 和（最重要的）所有空格。

该剧照遗漏了在某些方面，这似乎相当宽松：重要的是要注意 + 是有效的（因此在解码时不应该将其“转义”，作为空格），

但在其他方面，它似乎限制太多。 . 大括号和方括号，特别是：我知道它们通常不转义，浏览器也没有问题。

RFC 不允许使用大括号，但我认为这种 URI 没有问题。看到在大多数小书签中，它们被转义为“%20”。对此有什么（经验或理论）解释吗？

我仍然不知道是否有一些标准函数可以进行这种转义/转义（在主流语言中）或一些示例代码。

My findings, so far:

First, there are the rules for writing a valid HTML attribute value: but here the standard only requires (if the attribute value if enclosed in quotes) an arbitrary CDATA (actually a %URI, but HTML itself does not impose additional validation at its level: any CDATA will validate).

Some examples:

 <a href="javascript:alert('Hi!')">     (1)
 <a href="javascript:if(a > b && 1 < 0) alert(  b ? 'hi' : 'bye')">   (2)
 <a href="javascript:if(a>b &&& 1 < 0) alert( b ? 'hi' : 'bye')">  (3)

Example (1) is valid. But also example (2) is valid HTML 4.01 Strict. To make it valid XHTML we only need to escape the XML special characters < > & (example 3 is valid XHTML 1.0 Strict).

Now, is example (2) a valid javascript: URI ? I'm not sure, but I'd say it's not.

From RFC 2396: an URI is subject to some addition restrictions and, in particular, the escape/unescape via %xx sequences. And some characters are always prohibited:
among them spaces and {}# .

The RFC also defines a subset of opaque URIs: those that do not have hierarchical components, and for which the separating charactes have no special meaning (for example, they dont have a 'query string', so the ? can be used as any non special character). I assume javascript: URIs should be considered among them.

This would imply that the valid characters inside the 'body' of a javascript: URI are

 a-zA-Z0-9 
 _|. !~*'();?:@&=+$,/-   
 %hh : (escape sequence, with two hexadecimal digits)

with the additional restriction that it can't begin with /.
This stills leaves out some "important" ASCII characters, for example

{}#[]<>^\

Also % (because it's used for escape sequences), double quotes " and (most important) all blanks.

In some respects, this seems quite permissive: it's important to note that + is valid (and hence it should not be 'unescaped' when decoding, as a space).

But in other respects, it seems too restrictive. Braces and brackets, specially: I understand that they are normally used unescaped and browsers have no problems.

And what about spaces? As braces, they are disallowed by the RFC, but I see no problem in this kind of URI. However, I see that in most bookmarklets they are escaped as "%20". Is there any (empirical or theorical) explanation for this?

I still don't know if there are some standard functions to make this escape/unescape (in mainstream languages) or some sample code.

回复收藏 0 原文