清除 HTML 标签中的所有内联事件

发布于 2024-08-01 16:13:31 字数 262 浏览 5 评论 0原文

对于 HTML 输入,我想中和所有具有内联 js 的 HTML 元素(onclick=".."、onmouseout=".." 等)。 我在想,对下面的字符进行编码还不够吗? =,(,)

所以 onclick="location.href='ggg.com'"
会变成 onclick%3D"location.href%3D'ggg.com'"

我在这里缺少什么?

编辑:我确实需要接受活动 HTML(我无法转义全部或实体)。

For HTML input, I want to neutralize all HTML elements that have inline js (onclick="..", onmouseout=".." etc).
I am thinking, isn't it enough to encode the following chars? =,(,)

So onclick="location.href='ggg.com'"
will become
onclick%3D"location.href%3D'ggg.com'"

What am I missing here?

Edit: I do need to accept active HTML (I can't escape it all or entities is it).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

睫毛上残留的泪 2024-08-08 16:13:31

没有简单的方法可以接受 HTML,但不能接受脚本。

您必须将 HTML 解析为 DOM,删除 DOM 中所有不需要的元素和属性并生成新的 HTML。

它可以'使用正则表达式可以可靠地完成

on* 属性还不够。 脚本可以嵌入到 stylesrchref 等属性中。

如果您使用 PHP,请使用 HTML Purifier

There's no simple method to accept HTML, but not scripts.

You have to parse HTML to DOM, remove all unwanted elements and attributes in DOM and generate new HTML.

It can't be done reliably with regular expressions.

on* attributes are not enough. Scripts can be embedded in style, src, href and other attributes.

If you're using PHP, then use HTML Purifier.

幸福%小乖 2024-08-08 16:13:31

您可能有几个选择...最简单的方法是转换引号,并且可能 <> 字符,到它们的 HTML 编码等效项(“等),这将导致 HTML 代码按字面显示。

告诉我您使用的服务器端语言,如果您愿意,我可以为您指出更多特定于语言的信息。 (例如,PHP 有 htmlspecialchars()[1])。

好吧,你想允许 HTML 通过但不允许 JavaScript 吗?我建议,因为我没有想到一个简单的解决方案。只需使用字符串替换(如果可以的话,可以使用正则表达式吗?)来完全摆脱它们。JavaScript

中有一组有限的事件处理程序属性,再加上引号,您可能就可以很好地

证明 这一点。概念上,在 Perl 中,您可能会这样做:

$myInput =~ s/on(mouseover|mouseout|click|focus|blur|[...])(\"[^\"]*\")|(\'[^\']*\')\s*//gi;

因此,捕获事件处理程序名称(仅包含其中的一些),然后使用单引号或双引号引用表达式,末尾有可选的空格,并且 不过,

这对于需要更多级别引用的内容不起作用,因为最终您会回到原来的分隔符。 请原谅这个人为的且完全无用的示例:

onclick="eval('3+prompt("Enter a number: ")')"

在这种情况下,您可能需要编写一个循环,首先按单词解析字符串(即查找事件处理程序名称),然后逐个字符进行解析,跟踪引用的数量级别并跟踪当前分隔符:

  1. 标记处理程序名称开头的索引(onclick 中的“o”等)
  2. 从引用级别 0 开始(或在处理开始引号分隔符后从引用级别 1 开始) )。
  3. 如果当前分隔符为 " 并且您看到 ',则将引用级别增加 1 并将当前分隔符切换为 '。
  4. 如果当前分隔符为 " 并且您看到 ",则将引用级别减少 1 并将当前分隔符切换为 '。
  5. 如果当前分隔符是 ' 并且您看到 ",然后将引用级别增加 1 并将当前分隔符切换为 '。
  6. 如果当前分隔符是 ' 并且您看到 ',请将引用级别减少 1 并将当前分隔符切换为 '。
  7. 如果引用级别回到 0,则字符串已结束。 标记字符串结束位置的索引。
  8. 使用字符串操作函数截取从第一个索引到最后一个索引的子字符串。

这有点耗时,但理论上无论如何,假设 HTML 格式良好,它都应该可以工作。 (这是一个可怕的假设,但如果它的格式不正确,您无论如何都可以拒绝输入!)

[1] https://www.php.net/manual/en/function.htmlspecialchars.php

You probably have a couple of options... easiest way is to convert quotes, and possibly <> characters, to their HTML encoded equivalents (" etc.), which will result in the HTML code being displayed literally.

Tell me what server-side language are you using and I can point you towards more language-specific information, if you like. (For example, PHP has htmlspecialchars()[1]).

EDIT: I just actually read your question. Okay, you want to allow HTML through but no JavaScript? Well, for lack of a simple solution jumping to my mind, I suggest just using string replacement (regular expressions if you can, maybe?) to get rid of them entirely.

There are a finite set of event handler attributes in JavaScript. Couple that with the need for quotation marks and you're probably good.

For proof of concept, in Perl, you'd probably do something like this:

$myInput =~ s/on(mouseover|mouseout|click|focus|blur|[...])(\"[^\"]*\")|(\'[^\']*\')\s*//gi;

So, capture the event handler name (only some of which I included), then a quoted expression using either single or double quotes, have optional whitespace on the end, and replace the entire thing with nothing (i.e., delete it).

That won't work for something requiring more levels of quotation, though, since eventually you would come back to the original delimiters. Forgive the contrived and completely useless example:

onclick="eval('3+prompt("Enter a number: ")')"

In THAT case, you might want to write a loop that parses the string first by word (i.e., looking for the event handler name), then going character by character, keeping track of the number of quoting levels as you go and keeping track of the current delimiter:

  1. Mark the index of the beginning of the handler name (the "o" in onclick, etc.)
  2. Start with quoting level 0 (or 1 after you've processed the opening quotation delimiter).
  3. If the current delimiter is " and you see ', then increase the quoting level by 1 and switch current delimiter to '.
  4. If the current delimiter is " and you see ", decrease the quoting level by 1 and switch current delimiter to '.
  5. If the current delimiter is ' and you see ", then increase the quoting level by 1 and switch current delimiter to '.
  6. If the current delimiter is ' and you see ', decrease the quoting level by 1 and switch current delimiter to '.
  7. If the quoting level gets back down to 0, then your string has ended. Mark the index of where the string ends.
  8. Use a string manipulation function to cut out the substring from the first index to the last index.

It's a little more time-consuming, but it should theoretically work no matter what, assuming the HTML is well-formed. (That's a horrible assumption, but if it's not well-formed you could just reject the input anyway!)

[1] https://www.php.net/manual/en/function.htmlspecialchars.php

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文