当我需要从所见即所得编辑器渲染 HTML 时,如何防止 XSS 攻击?

发布于 2024-11-26 13:01:06 字数 359 浏览 2 评论 0原文

非技术背景信息:我在一所学校工作,我们正在使用 Django 构建一个新网站。学校的老师在技术上没有足够的能力使用另一种标记语言,例如 MarkDown。我们最终决定应该使用所见即所得编辑器,这会带来安全缺陷。我们不太担心老师本身,而是更担心恶意的学生可能会得到老师的凭证。

技术背景信息:我们正在使用 Django 1.3 运行,尚未选择特定的编辑器。我们倾向于使用 JavaScript,例如 TINYMCE,但可以说服使用任何允许安全性和易用性的东西。因为所见即所得编辑器将输出要渲染到文档中的 HTML,所以我们不能简单地转义它。

防止恶意代码同时又能让非技术教师轻松撰写帖子的最佳方法是什么?

Non-Technical Background info: I am working for a school and we are building a new website using Django. The teachers that work for the school aren't technologically competent enough to use another MarkUp language such as MarkDown. We eventually decided that we should use a WYSIWYG editor, which poses security flaws. We aren't too worried about the teachers themselves, but more malicious students that might get the teacher's credentials.

Technical Background info: We are running using Django 1.3 and have not chosen a specific editor yet. We are leaning towards a javascript one such as TINYMCE, but can be persuaded to use anything that allows security and ease of use. Because the WYSIWYG editor will output HTML to be rendered into the document, we cannot simply escape it.

What is the best way to prevent malicious code while still making it easy for non-technical teachers to write posts?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

终止放荡 2024-12-03 13:01:06

虽然已经晚了,但您可以尝试 Bleach,它在幕后使用 html5lib,您还可以获得标签平衡。

这是一个完整的片段:

settings.py

BLEACH_VALID_TAGS = ['p', 'b', 'i', 'strike', 'ul', 'li', 'ol', 'br',
                     'span', 'blockquote', 'hr', 'a', 'img']
BLEACH_VALID_ATTRS = {
    'span': ['style', ],
    'p': ['align', ],
    'a': ['href', 'rel'],
    'img': ['src', 'alt', 'style'],
}
BLEACH_VALID_STYLES = ['color', 'cursor', 'float', 'margin']

app/forms.py

import bleach
from django.conf import settings

class MyModelForm(forms.ModelForm):
    myfield = forms.CharField(widget=MyWYSIWYGEditor)


    class Meta:
        model = MyModel

    def clean_myfield(self):
        myfield = self.cleaned_data.get('myfield', '')
        cleaned_text = bleach.clean(myfield, settings.BLEACH_VALID_TAGS, settings.BLEACH_VALID_ATTRS, settings.BLEACH_VALID_STYLES)
        return cleaned_text #sanitize html

您可以阅读 bleach 文档,以便您可以根据自己的需要进行调整。

This is late, but you can try Bleach, under the hood it uses the html5lib, and you'll also get tag balancing.

Here is a complete snippet:

settings.py

BLEACH_VALID_TAGS = ['p', 'b', 'i', 'strike', 'ul', 'li', 'ol', 'br',
                     'span', 'blockquote', 'hr', 'a', 'img']
BLEACH_VALID_ATTRS = {
    'span': ['style', ],
    'p': ['align', ],
    'a': ['href', 'rel'],
    'img': ['src', 'alt', 'style'],
}
BLEACH_VALID_STYLES = ['color', 'cursor', 'float', 'margin']

app/forms.py

import bleach
from django.conf import settings

class MyModelForm(forms.ModelForm):
    myfield = forms.CharField(widget=MyWYSIWYGEditor)


    class Meta:
        model = MyModel

    def clean_myfield(self):
        myfield = self.cleaned_data.get('myfield', '')
        cleaned_text = bleach.clean(myfield, settings.BLEACH_VALID_TAGS, settings.BLEACH_VALID_ATTRS, settings.BLEACH_VALID_STYLES)
        return cleaned_text #sanitize html

You can read the bleach docs, so you can adapt it to your needs.

似狗非友 2024-12-03 13:01:06

您需要在服务器上解析 HTML 并删除任何不符合严格白名单的标签和属性。
您应该将其解析(或至少重新呈现)为严格的 XML,以防止攻击者利用模糊解析器之间的差异。

白名单不得包含

您还必须解析 href=""src="" 中的 URL,并确保它们是相对路径 http:// ,或https://

You need to parse the HTML on the server and remove any tags and attributes that don't meet a strict whitelist.
You should parse it (or at least re-render it) as strict XML to prevent attackers from exploiting differences between fuzzy parsers.

The whitelist must not include <script>, <style>, <link>, or <meta>, and must not include event handler attributes or style="".

You must also parse URLs in href="" and src="" and make sure that they are either relative paths, http://, or https://.

皇甫轩 2024-12-03 13:01:06

添加到 Nitely 的答案,这个答案很好但有点不完整:我还建议使用 Bleach ,但是如果您想使用它来预先批准安全的 CSS 样式,您需要使用 Bleach CSS Sanitizer(单独的 pip 安装到普通漂白剂包),这使得代码设置略有不同奈特利的。

我们在 Django 项目 forms.py 文件中使用以下内容(使用 Django-CKEditor 作为内容小部件)来清理用户输入 ReportPage 的数据。

import bleach 
from bleach.css_sanitizer import CSSSanitizer
from django.conf import settings

css_sanitizer = CSSSanitizer(allowed_css_properties=settings.BLEACH_VALID_STYLES)

class ReportPageForm(forms.ModelForm):
    content = forms.CharField(widget=CKEditorWidget())
    class Meta:
        model = ReportPage
        fields = ('name', 'content')

    def clean_content(self):
        content = self.cleaned_data['content']
        cleaned_content = bleach.clean(
            content, 
            tags=settings.BLEACH_VALID_TAGS, 
            attributes=settings.BLEACH_VALID_ATTRS, 
            protocols=settings.BLEACH_VALID_PROTOCOLS,
            css_sanitizer=css_sanitizer,
            strip=True
        )

我们包含 strip=True 来删除从表单内容中转义的标记。我们还包含协议,以便任何 href 属性(对于“a”标签)和 src 属性(对于“img”标签)都必须是 https(默认情况下启用 http 和 mailto,我们希望将其关闭)。

为了完整起见,在我们的 settings.py 文件中,我们将以下内容定义为用于我们目的的有效标记:

BLEACH_VALID_TAGS = (
    'a', 'abbr', 'acronym', 'b', 'blockquote', 'br', 'code', 
    'dd', 'div', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 
    'hr', 'i', 'img', 'li', 'ol', 'p', 'pre', 'span', 'strike', 
    'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 
    'thead', 'tr', 'tt', 'u', 'ul'
)
    
BLEACH_VALID_ATTRS = {
    '*': ['style', ], # allow all tags to have style attr
    'p': ['align', ],
    'a': ['href', 'rel'],
    'img': ['src', 'alt', 'style'],
}

BLEACH_VALID_STYLES = (
    'azimuth', 'background-color', 'border', 'border-bottom-color',
    'border-collapse', 'border-color', 'border-left-color',
    'border-right-color', 'border-top-color', 'clear',
    'color','cursor', 'direction', 'display', 'elevation', 'float',
    'font', 'font-family','font-size', 'font-style', 'font-variant',
    'font-weight', 'height', 'letter-spacing', 'line-height', 
    'margin', 'margin-bottom', 'margin-left', 'margin-right', 
    'margin-top', 'overflow', 'padding', 'padding-bottom', 
    'padding-left', 'padding-right', 'padding-top', 'pause', 
    'pause-after', 'pause-before', 'pitch', 'pitch-range',
    'richness', 'speak', 'speak-header', 'speak-numeral',
    'speak-punctuation', 'speech-rate', 'stress', 'text-align',
    'text-decoration', 'text-indent', 'unicode-bidi', 
    'vertical-align', 'voice-family', 'volume', 'white-space', 'width'
)

BLEACH_VALID_PROTOCOLS = ('https',)

Adding to Nitely's answer which was great but slightly incomplete: I also recommend using Bleach, but if you want to use it to pre-approve safe CSS styles you need to use Bleach CSS Sanitizer (separate pip install to the vanilla bleach package), which makes for a slightly different code set-up to Nitely's.

We use the below in our Django project forms.py file (using Django-CKEditor as the content widget) to sanitize the data for our user-input ReportPages.

import bleach 
from bleach.css_sanitizer import CSSSanitizer
from django.conf import settings

css_sanitizer = CSSSanitizer(allowed_css_properties=settings.BLEACH_VALID_STYLES)

class ReportPageForm(forms.ModelForm):
    content = forms.CharField(widget=CKEditorWidget())
    class Meta:
        model = ReportPage
        fields = ('name', 'content')

    def clean_content(self):
        content = self.cleaned_data['content']
        cleaned_content = bleach.clean(
            content, 
            tags=settings.BLEACH_VALID_TAGS, 
            attributes=settings.BLEACH_VALID_ATTRS, 
            protocols=settings.BLEACH_VALID_PROTOCOLS,
            css_sanitizer=css_sanitizer,
            strip=True
        )

We include strip=True to remove mark-up that is escaped from the form content. We also include protocols so that any href attrs (for 'a' tags) and src attrs (for 'img' tags) must be https (http and mailto are enabled by default, which we wanted turned off).

For completeness' sake, inside our settings.py file we define the following as valid mark-up for our purposes:

BLEACH_VALID_TAGS = (
    'a', 'abbr', 'acronym', 'b', 'blockquote', 'br', 'code', 
    'dd', 'div', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 
    'hr', 'i', 'img', 'li', 'ol', 'p', 'pre', 'span', 'strike', 
    'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 
    'thead', 'tr', 'tt', 'u', 'ul'
)
    
BLEACH_VALID_ATTRS = {
    '*': ['style', ], # allow all tags to have style attr
    'p': ['align', ],
    'a': ['href', 'rel'],
    'img': ['src', 'alt', 'style'],
}

BLEACH_VALID_STYLES = (
    'azimuth', 'background-color', 'border', 'border-bottom-color',
    'border-collapse', 'border-color', 'border-left-color',
    'border-right-color', 'border-top-color', 'clear',
    'color','cursor', 'direction', 'display', 'elevation', 'float',
    'font', 'font-family','font-size', 'font-style', 'font-variant',
    'font-weight', 'height', 'letter-spacing', 'line-height', 
    'margin', 'margin-bottom', 'margin-left', 'margin-right', 
    'margin-top', 'overflow', 'padding', 'padding-bottom', 
    'padding-left', 'padding-right', 'padding-top', 'pause', 
    'pause-after', 'pause-before', 'pitch', 'pitch-range',
    'richness', 'speak', 'speak-header', 'speak-numeral',
    'speak-punctuation', 'speech-rate', 'stress', 'text-align',
    'text-decoration', 'text-indent', 'unicode-bidi', 
    'vertical-align', 'voice-family', 'volume', 'white-space', 'width'
)

BLEACH_VALID_PROTOCOLS = ('https',)
烟沫凡尘 2024-12-03 13:01:06

@SLaks 是对的,您需要在服务器上进行清理,因为窃取教师凭据的学生可以使用这些凭据直接 POST 到您的服务器。

Python HTML 清理器/清理器/过滤器 讨论可用于 python 的现有 HTML 清理器。

我建议从一个空的白名单开始,然后使用所见即所得编辑器使用每个按钮创建 HTML 片段,以便您了解它生成的 HTML 的种类,然后仅将支持该 HTML 所需的标签/属性列入白名单产生。希望它不使用 CSS style属性,因为它们也可能是 XSS 向量。

@SLaks is right that you need to do the sanitization on the server since students who steal a teacher's credentials could use those credentials to POST directly to your server.

Python HTML sanitizer / scrubber / filter discusses existing HTML sanitizers available for python.

I would suggest starting with an empty white-list, then use the WYSIWYG editor to create a snippet of HTML using each button so that you know the varieties of HTML it produces, and then whitelist only the tags/attributes needed to support the HTML it produces. Hopefully it doesn't use the CSS style attribute because those can also be an XSS vector.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文