用于渲染 html 子集的 Django templatetag
我有一些 html(在本例中是通过 TinyMCE 创建的),我想将其添加到页面中。但是,出于安全原因,我不想只打印用户输入的所有内容。
有谁知道模板标签(最好是过滤器)只允许渲染 html 的安全子集?
我意识到 Markdown 和其他人就是这么做的。但是,他们还添加了额外的标记语法,这可能会让我的用户感到困惑,因为他们使用的是不了解 Markdown 的富文本编辑器。
I have some html (in this case created via TinyMCE) that I would like to add to a page. However, for security reason, I don't want to just print everything the user has entered.
Does anyone know of a templatetag (a filter, preferably) that will allow only a safe subset of html to be rendered?
I realize that markdown and others do this. However, they also add additional markup syntax which could be confusing for my users, since they are using a rich text editor that doesn't know about markdown.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
有 removetags,但这是一种黑名单方法,无法当标签看起来与 Django 期望的格式良好的标签不完全一样时,删除标签,当然,由于它不尝试删除属性,因此完全容易受到 1,000 种不涉及 < 的其他脚本注入方式的攻击code>
基于正则表达式黑客的 HTML 清理方法几乎不可避免地会彻底失败。使用真正的 HTML 解析器获取提交内容的对象模型,然后以已知良好的格式进行过滤和重新序列化,通常是最可靠的方法。
如果您的富文本编辑器输出 XHTML,那么很简单,只需使用 minidom 或 etree 来解析文档,然后遍历它,删除除已知良好的元素和属性之外的所有元素,最后转换回安全的 XML。另一方面,如果它输出 HTML,或者允许用户输入原始 HTML,您可能需要在其上使用 BeautifulSoup 之类的东西。请参阅此问题进行一些讨论。
过滤 HTML 是一个庞大而复杂的主题,这就是为什么许多人更喜欢带有限制性标记的文本语言。
There's removetags, but it's a blacklisting approach which fails to remove tags when they don't look exactly like the well-formed tags Django expects, and of course since it doesn't attempt to remove attributes it is totally vulnerable to the 1,000 other ways of script-injection that don't involve the
<script>
tag. It's a trap, offering the illusion of safety whilst actually providing no real security at all.HTML-sanitisation approaches based on regex hacking are almost inevitably a total fail. Using a real HTML parser to get an object model for the submitted content, then filtering and re-serialising in a known-good format, is generally the most reliable approach.
If your rich text editor outputs XHTML it's easy, just use minidom or etree to parse the document then walk over it removing all but known-good elements and attributes and finally convert back to safe XML. If, on the other hand, it spits out HTML, or allows the user to input raw HTML, you may need to use something like BeautifulSoup on it. See this question for some discussion.
Filtering HTML is a large and complicated topic, which is why many people prefer the text-with-restrictive-markup languages.
使用 HTML Purifier、html5lib 或其他为进行 HTML 清理而构建的库。
Use HTML Purifier, html5lib, or another library that is built to do HTML sanitization.
您可以使用
removetags
指定要删除的标签列表:You can use
removetags
to specify list of tags to be remove: