当我需要从所见即所得编辑器渲染 HTML 时，如何防止 XSS 攻击？

发布于 2024-11-26 13:01:06 字数 359 浏览 2 评论 0原文

非技术背景信息：我在一所学校工作，我们正在使用 Django 构建一个新网站。学校的老师在技术上没有足够的能力使用另一种标记语言，例如 MarkDown。我们最终决定应该使用所见即所得编辑器，这会带来安全缺陷。我们不太担心老师本身，而是更担心恶意的学生可能会得到老师的凭证。

技术背景信息：我们正在使用 Django 1.3 运行，尚未选择特定的编辑器。我们倾向于使用 JavaScript，例如 TINYMCE，但可以说服使用任何允许安全性和易用性的东西。因为所见即所得编辑器将输出要渲染到文档中的 HTML，所以我们不能简单地转义它。

防止恶意代码同时又能让非技术教师轻松撰写帖子的最佳方法是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

终止放荡 2024-12-03 13:01:06

虽然已经晚了，但您可以尝试 Bleach，它在幕后使用 html5lib，您还可以获得标签平衡。

这是一个完整的片段：

settings.py

BLEACH_VALID_TAGS = ['p', 'b', 'i', 'strike', 'ul', 'li', 'ol', 'br',
                     'span', 'blockquote', 'hr', 'a', 'img']
BLEACH_VALID_ATTRS = {
    'span': ['style', ],
    'p': ['align', ],
    'a': ['href', 'rel'],
    'img': ['src', 'alt', 'style'],
}
BLEACH_VALID_STYLES = ['color', 'cursor', 'float', 'margin']

app/forms.py

import bleach
from django.conf import settings

class MyModelForm(forms.ModelForm):
    myfield = forms.CharField(widget=MyWYSIWYGEditor)


    class Meta:
        model = MyModel

    def clean_myfield(self):
        myfield = self.cleaned_data.get('myfield', '')
        cleaned_text = bleach.clean(myfield, settings.BLEACH_VALID_TAGS, settings.BLEACH_VALID_ATTRS, settings.BLEACH_VALID_STYLES)
        return cleaned_text #sanitize html

您可以阅读 bleach 文档，以便您可以根据自己的需要进行调整。

This is late, but you can try Bleach, under the hood it uses the html5lib, and you'll also get tag balancing.

Here is a complete snippet:

settings.py

BLEACH_VALID_TAGS = ['p', 'b', 'i', 'strike', 'ul', 'li', 'ol', 'br',
                     'span', 'blockquote', 'hr', 'a', 'img']
BLEACH_VALID_ATTRS = {
    'span': ['style', ],
    'p': ['align', ],
    'a': ['href', 'rel'],
    'img': ['src', 'alt', 'style'],
}
BLEACH_VALID_STYLES = ['color', 'cursor', 'float', 'margin']

app/forms.py

import bleach
from django.conf import settings

class MyModelForm(forms.ModelForm):
    myfield = forms.CharField(widget=MyWYSIWYGEditor)


    class Meta:
        model = MyModel

    def clean_myfield(self):
        myfield = self.cleaned_data.get('myfield', '')
        cleaned_text = bleach.clean(myfield, settings.BLEACH_VALID_TAGS, settings.BLEACH_VALID_ATTRS, settings.BLEACH_VALID_STYLES)
        return cleaned_text #sanitize html

You can read the bleach docs, so you can adapt it to your needs.

回复收藏 0 原文

似狗非友 2024-12-03 13:01:06

您需要在服务器上解析 HTML 并删除任何不符合严格白名单的标签和属性。
您应该将其解析（或至少重新呈现）为严格的 XML，以防止攻击者利用模糊解析器之间的差异。

白名单不得包含

您还必须解析 href="" 和 src="" 中的 URL，并确保它们是相对路径 http:// ，或https://。

回复收藏 0 原文

皇甫轩 2024-12-03 13:01:06

添加到 Nitely 的答案，这个答案很好但有点不完整：我还建议使用 Bleach ，但是如果您想使用它来预先批准安全的 CSS 样式，您需要使用 Bleach CSS Sanitizer（单独的 pip 安装到普通漂白剂包），这使得代码设置略有不同奈特利的。

我们在 Django 项目 forms.py 文件中使用以下内容（使用 Django-CKEditor 作为内容小部件）来清理用户输入 ReportPage 的数据。

import bleach 
from bleach.css_sanitizer import CSSSanitizer
from django.conf import settings

css_sanitizer = CSSSanitizer(allowed_css_properties=settings.BLEACH_VALID_STYLES)

class ReportPageForm(forms.ModelForm):
    content = forms.CharField(widget=CKEditorWidget())
    class Meta:
        model = ReportPage
        fields = ('name', 'content')

    def clean_content(self):
        content = self.cleaned_data['content']
        cleaned_content = bleach.clean(
            content, 
            tags=settings.BLEACH_VALID_TAGS, 
            attributes=settings.BLEACH_VALID_ATTRS, 
            protocols=settings.BLEACH_VALID_PROTOCOLS,
            css_sanitizer=css_sanitizer,
            strip=True
        )

我们包含 strip=True 来删除从表单内容中转义的标记。我们还包含协议，以便任何 href 属性（对于“a”标签）和 src 属性（对于“img”标签）都必须是 https（默认情况下启用 http 和 mailto，我们希望将其关闭）。

为了完整起见，在我们的 settings.py 文件中，我们将以下内容定义为用于我们目的的有效标记：

BLEACH_VALID_TAGS = (
    'a', 'abbr', 'acronym', 'b', 'blockquote', 'br', 'code', 
    'dd', 'div', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 
    'hr', 'i', 'img', 'li', 'ol', 'p', 'pre', 'span', 'strike', 
    'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 
    'thead', 'tr', 'tt', 'u', 'ul'
)
    
BLEACH_VALID_ATTRS = {
    '*': ['style', ], # allow all tags to have style attr
    'p': ['align', ],
    'a': ['href', 'rel'],
    'img': ['src', 'alt', 'style'],
}

BLEACH_VALID_STYLES = (
    'azimuth', 'background-color', 'border', 'border-bottom-color',
    'border-collapse', 'border-color', 'border-left-color',
    'border-right-color', 'border-top-color', 'clear',
    'color','cursor', 'direction', 'display', 'elevation', 'float',
    'font', 'font-family','font-size', 'font-style', 'font-variant',
    'font-weight', 'height', 'letter-spacing', 'line-height', 
    'margin', 'margin-bottom', 'margin-left', 'margin-right', 
    'margin-top', 'overflow', 'padding', 'padding-bottom', 
    'padding-left', 'padding-right', 'padding-top', 'pause', 
    'pause-after', 'pause-before', 'pitch', 'pitch-range',
    'richness', 'speak', 'speak-header', 'speak-numeral',
    'speak-punctuation', 'speech-rate', 'stress', 'text-align',
    'text-decoration', 'text-indent', 'unicode-bidi', 
    'vertical-align', 'voice-family', 'volume', 'white-space', 'width'
)

BLEACH_VALID_PROTOCOLS = ('https',)

Adding to Nitely's answer which was great but slightly incomplete: I also recommend using Bleach, but if you want to use it to pre-approve safe CSS styles you need to use Bleach CSS Sanitizer (separate pip install to the vanilla bleach package), which makes for a slightly different code set-up to Nitely's.

We use the below in our Django project forms.py file (using Django-CKEditor as the content widget) to sanitize the data for our user-input ReportPages.

import bleach 
from bleach.css_sanitizer import CSSSanitizer
from django.conf import settings

css_sanitizer = CSSSanitizer(allowed_css_properties=settings.BLEACH_VALID_STYLES)

class ReportPageForm(forms.ModelForm):
    content = forms.CharField(widget=CKEditorWidget())
    class Meta:
        model = ReportPage
        fields = ('name', 'content')

    def clean_content(self):
        content = self.cleaned_data['content']
        cleaned_content = bleach.clean(
            content, 
            tags=settings.BLEACH_VALID_TAGS, 
            attributes=settings.BLEACH_VALID_ATTRS, 
            protocols=settings.BLEACH_VALID_PROTOCOLS,
            css_sanitizer=css_sanitizer,
            strip=True
        )

We include strip=True to remove mark-up that is escaped from the form content. We also include protocols so that any href attrs (for 'a' tags) and src attrs (for 'img' tags) must be https (http and mailto are enabled by default, which we wanted turned off).

For completeness' sake, inside our settings.py file we define the following as valid mark-up for our purposes:

BLEACH_VALID_TAGS = (
    'a', 'abbr', 'acronym', 'b', 'blockquote', 'br', 'code', 
    'dd', 'div', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 
    'hr', 'i', 'img', 'li', 'ol', 'p', 'pre', 'span', 'strike', 
    'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 
    'thead', 'tr', 'tt', 'u', 'ul'
)
    
BLEACH_VALID_ATTRS = {
    '*': ['style', ], # allow all tags to have style attr
    'p': ['align', ],
    'a': ['href', 'rel'],
    'img': ['src', 'alt', 'style'],
}

BLEACH_VALID_STYLES = (
    'azimuth', 'background-color', 'border', 'border-bottom-color',
    'border-collapse', 'border-color', 'border-left-color',
    'border-right-color', 'border-top-color', 'clear',
    'color','cursor', 'direction', 'display', 'elevation', 'float',
    'font', 'font-family','font-size', 'font-style', 'font-variant',
    'font-weight', 'height', 'letter-spacing', 'line-height', 
    'margin', 'margin-bottom', 'margin-left', 'margin-right', 
    'margin-top', 'overflow', 'padding', 'padding-bottom', 
    'padding-left', 'padding-right', 'padding-top', 'pause', 
    'pause-after', 'pause-before', 'pitch', 'pitch-range',
    'richness', 'speak', 'speak-header', 'speak-numeral',
    'speak-punctuation', 'speech-rate', 'stress', 'text-align',
    'text-decoration', 'text-indent', 'unicode-bidi', 
    'vertical-align', 'voice-family', 'volume', 'white-space', 'width'
)

BLEACH_VALID_PROTOCOLS = ('https',)

回复收藏 0 原文