使用 jQuery 将标签列入白名单是否明智? JavaScript 有现成的解决方案吗?
我的问题是
我想清理粘贴在富文本编辑器(目前为 FCK 1.6)中的 HTML。清理应该基于标签白名单(也许还有另一个带有属性的白名单)。这主要不是为了防止 XSS,而是为了删除丑陋的 HTML。
目前我看不到在服务器上完成它的方法,所以我猜它必须在 JavaScript 中完成。
目前的想法
我找到了 jquery-clean 插件,但据我所知,它使用正则表达式来完成工作,并且 我们知道这不安全。
由于我还没有找到任何其他基于 JS 的解决方案,因此我开始使用 jQuery 自己实现一个。它的工作原理是创建粘贴的 html ($(pastedHtml)
) 的 jQuery 版本,然后遍历生成的树,通过查看属性 tagName.
我的问题
- 这更好吗?
- 我可以相信 jQuery 来代表粘贴的内容吗 内容很好(可能有不匹配的 结束标签和你有什么)?
- 有没有更好的解决方案 我没找到?
更新
这是我当前基于 jQuery 的解决方案(冗长且未经过广泛测试):
function clean(element, whitelist, replacerTagName) {
// Use div if no replace tag was specified
replacerTagName = replacerTagName || "div";
// Accept anything that jQuery accepts
var jq = $(element);
// Create a a copy of the current element, but without its children
var clone = jq.clone();
clone.children().remove();
// Wrap the copy in a dummy parent to be able to search with jQuery selectors
// 1)
var wrapper = $('<div/>').append(clone);
// Check if the element is not on the whitelist by searching with the 'not' selector
var invalidElement = wrapper.find(':not(' + whitelist + ')');
// If the element wasn't on the whitelist, replace it.
if (invalidElement.length > 0) {
var el = $('<' + replacerTagName + '/>');
el.text(invalidElement.text());
invalidElement.replaceWith(el);
}
// Extract the (maybe replaced) element
var cleanElement = $(wrapper.children().first());
// Recursively clean the children of the original element and
// append them to the cleaned element
var children = jq.children();
if (children.length > 0) {
children.each(function(_index, thechild) {
var cleaned = clean(thechild, whitelist, replacerTagName);
cleanElement.append(cleaned);
});
}
return cleanElement;
}
我想知道一些要点(请参阅代码中的注释);
- 我真的需要将我的元素包装在虚拟父元素中才能将其与 jQuery 的“:not”匹配吗?
- 这是创建新节点的推荐方法吗?
My problem
I want to clean HTML pasted in a rich text editor (FCK 1.6 at the moment). The cleaning should be based on a whitelist of tags (and perhaps another with attributes). This is not primarily in order to prevent XSS, but to remove ugly HTML.
Currently I see no way to do it on the server, so I guess it must be done in JavaScript.
Current ideas
I found the jquery-clean plugin, but as far as I can see, it is using regexes to do the work, and we know that is not safe.
As I've not found any other JS-based solution I've started to impement one myself using jQuery. It would work by creating a jQuery version of the pasted html ($(pastedHtml)
) and then traverse the resulting tree, removing each element not matching the whitelist by looking at the attribute tagName
.
My questions
- Is this any better?
- Can I trust jQuery to represent the pasted
content well (there may be unmatched
ending tags and what-have-you)? - Is there a better solution already that
I couldn't find?
Update
This is my current, jQuery-based, solution (verbose and not extensively tested):
function clean(element, whitelist, replacerTagName) {
// Use div if no replace tag was specified
replacerTagName = replacerTagName || "div";
// Accept anything that jQuery accepts
var jq = $(element);
// Create a a copy of the current element, but without its children
var clone = jq.clone();
clone.children().remove();
// Wrap the copy in a dummy parent to be able to search with jQuery selectors
// 1)
var wrapper = $('<div/>').append(clone);
// Check if the element is not on the whitelist by searching with the 'not' selector
var invalidElement = wrapper.find(':not(' + whitelist + ')');
// If the element wasn't on the whitelist, replace it.
if (invalidElement.length > 0) {
var el = $('<' + replacerTagName + '/>');
el.text(invalidElement.text());
invalidElement.replaceWith(el);
}
// Extract the (maybe replaced) element
var cleanElement = $(wrapper.children().first());
// Recursively clean the children of the original element and
// append them to the cleaned element
var children = jq.children();
if (children.length > 0) {
children.each(function(_index, thechild) {
var cleaned = clean(thechild, whitelist, replacerTagName);
cleanElement.append(cleaned);
});
}
return cleanElement;
}
I am wondering about some points (see comments in the code);
- Do I really need to wrap my element in a dummy parent to be able to match it with jQuery's ":not"?
- Is this the recommended way to create a new node?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您利用浏览器的 HTML 纠正功能(例如,将富文本复制到空
div
的innerHTML
并获取生成的 DOM 树),则 HTML 将保证有效(纠正方式在某种程度上取决于浏览器)。尽管这可能是由富编辑器完成的。jQuery 自己的文本顶部 DOM 转换可能也是安全的,但速度肯定较慢,所以我会避免使用它。
使用基于 jQuery 选择器引擎的白名单可能有些棘手,因为在保留其子元素的同时删除元素可能会使文档无效,因此浏览器会通过更改 DOM 树来纠正它,这可能会混淆尝试迭代无效元素的脚本。 (例如,您允许
ul
和li
但不允许ol
;脚本会删除列表根元素、裸露的li
元素无效,因此浏览器再次将它们包装在ul
中,清理脚本将错过该ul
。)如果您将不需要的元素与其所有子元素一起丢弃,我不认为没有看到任何问题。If you leverage the browser's HTML correcting abilities (e.g. you copy the rich text to the
innerHTML
of an emptydiv
and take the resulting DOM tree), the HTML will be guaranteed to be valid (the way it will be corrected is somewhat browser-dependent). Although this is probably done by rich editor anyways.jQuery's own text-top DOM transform is probably also safe, but definitely slower, so I would avoid it.
Using a whitelist based on the jQuery selector engine might be somewhat tricky because removing an element while preserving its children might make the document invalid, so the browser would correct it by changing the DOM tree, which might confuse a script trying to iterate through invalid elements. (E.g. you allow
ul
andli
but notol
; the script removes the list root element, nakedli
elements are invalid so the browser wraps them inul
again, thatul
will be missed by the cleaning script.) If you throw away unwanted elements together with all their children, I don't see any problems with that.