使用 jQuery 将标签列入白名单是否明智? JavaScript 有现成的解决方案吗?

发布于 2024-10-23 11:46:38 字数 2382 浏览 2 评论 0原文

我的问题是

我想清理粘贴在富文本编辑器(目前为 FCK 1.6)中的 HTML。清理应该基于标签白名单(也许还有另一个带有属性的白名单)。这主要不是为了防止 XSS,而是为了删除丑陋的 HTML。

目前我看不到在服务器上完成它的方法,所以我猜它必须在 JavaScript 中完成。

目前的想法

我找到了 jquery-clean 插件,但据我所知,它使用正则表达式来完成工作,并且 我们知道这不安全

由于我还没有找到任何其他基于 JS 的解决方案,因此我开始使用 jQuery 自己实现一个。它的工作原理是创建粘贴的 html ($(pastedHtml)) 的 jQuery 版本,然后遍历生成的树,通过查看属性 tagName.

我的问题

  • 这更好吗?
  • 我可以相信 jQuery 来代表粘贴的内容吗 内容很好(可能有不匹配的 结束标签和你有什么)?
  • 有没有更好的解决方案 我没找到?

更新

这是我当前基于 jQuery 的解决方案(冗长且未经过广泛测试):

function clean(element, whitelist, replacerTagName) {
    // Use div if no replace tag was specified
    replacerTagName = replacerTagName || "div";

    // Accept anything that jQuery accepts
    var jq = $(element);    

    // Create a a copy of the current element, but without its children
    var clone = jq.clone();
    clone.children().remove();

    // Wrap the copy in a dummy parent to be able to search with jQuery selectors
    // 1)
    var wrapper = $('<div/>').append(clone);

    // Check if the element is not on the whitelist by searching with the 'not' selector
    var invalidElement = wrapper.find(':not(' + whitelist + ')');

    // If the element wasn't on the whitelist, replace it.
    if (invalidElement.length > 0) {
       var el = $('<' + replacerTagName + '/>'); 
       el.text(invalidElement.text()); 
       invalidElement.replaceWith(el);   
    }

    // Extract the (maybe replaced) element
    var cleanElement = $(wrapper.children().first());

    // Recursively clean the children of the original element and
    // append them to the cleaned element
    var children = jq.children();
    if (children.length > 0) {
        children.each(function(_index, thechild) {
                          var cleaned = clean(thechild, whitelist, replacerTagName);
                          cleanElement.append(cleaned);
                      });
      } 
    return cleanElement;
}

我想知道一些要点(请参阅代码中的注释);

  1. 我真的需要将我的元素包装在虚拟父元素中才能将其与 jQuery 的“:not”匹配吗?
  2. 这是创建新节点的推荐方法吗?

My problem

I want to clean HTML pasted in a rich text editor (FCK 1.6 at the moment). The cleaning should be based on a whitelist of tags (and perhaps another with attributes). This is not primarily in order to prevent XSS, but to remove ugly HTML.

Currently I see no way to do it on the server, so I guess it must be done in JavaScript.

Current ideas

I found the jquery-clean plugin, but as far as I can see, it is using regexes to do the work, and we know that is not safe.

As I've not found any other JS-based solution I've started to impement one myself using jQuery. It would work by creating a jQuery version of the pasted html ($(pastedHtml)) and then traverse the resulting tree, removing each element not matching the whitelist by looking at the attribute tagName.

My questions

  • Is this any better?
  • Can I trust jQuery to represent the pasted
    content well (there may be unmatched
    ending tags and what-have-you)?
  • Is there a better solution already that
    I couldn't find?

Update

This is my current, jQuery-based, solution (verbose and not extensively tested):

function clean(element, whitelist, replacerTagName) {
    // Use div if no replace tag was specified
    replacerTagName = replacerTagName || "div";

    // Accept anything that jQuery accepts
    var jq = $(element);    

    // Create a a copy of the current element, but without its children
    var clone = jq.clone();
    clone.children().remove();

    // Wrap the copy in a dummy parent to be able to search with jQuery selectors
    // 1)
    var wrapper = $('<div/>').append(clone);

    // Check if the element is not on the whitelist by searching with the 'not' selector
    var invalidElement = wrapper.find(':not(' + whitelist + ')');

    // If the element wasn't on the whitelist, replace it.
    if (invalidElement.length > 0) {
       var el = $('<' + replacerTagName + '/>'); 
       el.text(invalidElement.text()); 
       invalidElement.replaceWith(el);   
    }

    // Extract the (maybe replaced) element
    var cleanElement = $(wrapper.children().first());

    // Recursively clean the children of the original element and
    // append them to the cleaned element
    var children = jq.children();
    if (children.length > 0) {
        children.each(function(_index, thechild) {
                          var cleaned = clean(thechild, whitelist, replacerTagName);
                          cleanElement.append(cleaned);
                      });
      } 
    return cleanElement;
}

I am wondering about some points (see comments in the code);

  1. Do I really need to wrap my element in a dummy parent to be able to match it with jQuery's ":not"?
  2. Is this the recommended way to create a new node?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

浅笑依然 2024-10-30 11:46:38

如果您利用浏览器的 HTML 纠正功能(例如,将富文本复制到空 divinnerHTML 并获取生成的 DOM 树),则 HTML 将保证有效(纠正方式在某种程度上取决于浏览器)。尽管这可能是由富编辑器完成的。

jQuery 自己的文本顶部 DOM 转换可能也是安全的,但速度肯定较慢,所以我会避免使用它。

使用基于 jQuery 选择器引擎的白名单可能有些棘手,因为在保留其子元素的同时删除元素可能会使文档无效,因此浏览器会通过更改 DOM 树来纠正它,这可能会混淆尝试迭代无效元素的脚本。 (例如,您允许 ulli 但不允许 ol;脚本会删除列表根元素、裸露的 li 元素无效,因此浏览器再次将它们包装在 ul 中,清理脚本将错过该 ul。)如果您将不需要的元素与其所有子元素一起丢弃,我不认为没有看到任何问题。

If you leverage the browser's HTML correcting abilities (e.g. you copy the rich text to the innerHTML of an empty div and take the resulting DOM tree), the HTML will be guaranteed to be valid (the way it will be corrected is somewhat browser-dependent). Although this is probably done by rich editor anyways.

jQuery's own text-top DOM transform is probably also safe, but definitely slower, so I would avoid it.

Using a whitelist based on the jQuery selector engine might be somewhat tricky because removing an element while preserving its children might make the document invalid, so the browser would correct it by changing the DOM tree, which might confuse a script trying to iterate through invalid elements. (E.g. you allow ul and li but not ol; the script removes the list root element, naked li elements are invalid so the browser wraps them in ul again, that ul will be missed by the cleaning script.) If you throw away unwanted elements together with all their children, I don't see any problems with that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文