使用 jQuery 将标签列入白名单是否明智？ JavaScript 有现成的解决方案吗？

发布于 2024-10-23 11:46:38 字数 2382 浏览 6 评论 0原文

我的问题是

我想清理粘贴在富文本编辑器（目前为 FCK 1.6）中的 HTML。清理应该基于标签白名单（也许还有另一个带有属性的白名单）。这主要不是为了防止 XSS，而是为了删除丑陋的 HTML。

目前我看不到在服务器上完成它的方法，所以我猜它必须在 JavaScript 中完成。

目前的想法

我找到了 jquery-clean 插件，但据我所知，它使用正则表达式来完成工作，并且我们知道这不安全。

由于我还没有找到任何其他基于 JS 的解决方案，因此我开始使用 jQuery 自己实现一个。它的工作原理是创建粘贴的 html ($(pastedHtml)) 的 jQuery 版本，然后遍历生成的树，通过查看属性 tagName.

我的问题

这更好吗？
我可以相信 jQuery 来代表粘贴的内容吗内容很好（可能有不匹配的结束标签和你有什么）？
有没有更好的解决方案我没找到？

更新

这是我当前基于 jQuery 的解决方案（冗长且未经过广泛测试）：

function clean(element, whitelist, replacerTagName) {
    // Use div if no replace tag was specified
    replacerTagName = replacerTagName || "div";

    // Accept anything that jQuery accepts
    var jq = $(element);    

    // Create a a copy of the current element, but without its children
    var clone = jq.clone();
    clone.children().remove();

    // Wrap the copy in a dummy parent to be able to search with jQuery selectors
    // 1)
    var wrapper = $('<div/>').append(clone);

    // Check if the element is not on the whitelist by searching with the 'not' selector
    var invalidElement = wrapper.find(':not(' + whitelist + ')');

    // If the element wasn't on the whitelist, replace it.
    if (invalidElement.length > 0) {
       var el = $('<' + replacerTagName + '/>'); 
       el.text(invalidElement.text()); 
       invalidElement.replaceWith(el);   
    }

    // Extract the (maybe replaced) element
    var cleanElement = $(wrapper.children().first());

    // Recursively clean the children of the original element and
    // append them to the cleaned element
    var children = jq.children();
    if (children.length > 0) {
        children.each(function(_index, thechild) {
                          var cleaned = clean(thechild, whitelist, replacerTagName);
                          cleanElement.append(cleaned);
                      });
      } 
    return cleanElement;
}

我想知道一些要点（请参阅代码中的注释）；

我真的需要将我的元素包装在虚拟父元素中才能将其与 jQuery 的“:not”匹配吗？
这是创建新节点的推荐方法吗？

原文

My problem

I want to clean HTML pasted in a rich text editor (FCK 1.6 at the moment). The cleaning should be based on a whitelist of tags (and perhaps another with attributes). This is not primarily in order to prevent XSS, but to remove ugly HTML.

Currently I see no way to do it on the server, so I guess it must be done in JavaScript.

Current ideas

I found the jquery-clean plugin, but as far as I can see, it is using regexes to do the work, and we know that is not safe.

As I've not found any other JS-based solution I've started to impement one myself using jQuery. It would work by creating a jQuery version of the pasted html ($(pastedHtml)) and then traverse the resulting tree, removing each element not matching the whitelist by looking at the attribute tagName.

My questions

Is this any better?
Can I trust jQuery to represent the pasted
content well (there may be unmatched
ending tags and what-have-you)?
Is there a better solution already that
I couldn't find?

Update

This is my current, jQuery-based, solution (verbose and not extensively tested):

function clean(element, whitelist, replacerTagName) {
    // Use div if no replace tag was specified
    replacerTagName = replacerTagName || "div";

    // Accept anything that jQuery accepts
    var jq = $(element);    

    // Create a a copy of the current element, but without its children
    var clone = jq.clone();
    clone.children().remove();

    // Wrap the copy in a dummy parent to be able to search with jQuery selectors
    // 1)
    var wrapper = $('<div/>').append(clone);

    // Check if the element is not on the whitelist by searching with the 'not' selector
    var invalidElement = wrapper.find(':not(' + whitelist + ')');

    // If the element wasn't on the whitelist, replace it.
    if (invalidElement.length > 0) {
       var el = $('<' + replacerTagName + '/>'); 
       el.text(invalidElement.text()); 
       invalidElement.replaceWith(el);   
    }

    // Extract the (maybe replaced) element
    var cleanElement = $(wrapper.children().first());

    // Recursively clean the children of the original element and
    // append them to the cleaned element
    var children = jq.children();
    if (children.length > 0) {
        children.each(function(_index, thechild) {
                          var cleaned = clean(thechild, whitelist, replacerTagName);
                          cleanElement.append(cleaned);
                      });
      } 
    return cleanElement;
}

I am wondering about some points (see comments in the code);

Do I really need to wrap my element in a dummy parent to be able to match it with jQuery's ":not"?
Is this the recommended way to create a new node?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅笑依然 2024-10-30 11:46:38

如果您利用浏览器的 HTML 纠正功能（例如，将富文本复制到空 div 的 innerHTML 并获取生成的 DOM 树），则 HTML 将保证有效（纠正方式在某种程度上取决于浏览器）。尽管这可能是由富编辑器完成的。

jQuery 自己的文本顶部 DOM 转换可能也是安全的，但速度肯定较慢，所以我会避免使用它。

使用基于 jQuery 选择器引擎的白名单可能有些棘手，因为在保留其子元素的同时删除元素可能会使文档无效，因此浏览器会通过更改 DOM 树来纠正它，这可能会混淆尝试迭代无效元素的脚本。（例如，您允许 ul 和 li 但不允许 ol；脚本会删除列表根元素、裸露的 li 元素无效，因此浏览器再次将它们包装在 ul 中，清理脚本将错过该 ul。）如果您将不需要的元素与其所有子元素一起丢弃，我不认为没有看到任何问题。