如何在允许 HTML 输入的同时防止 XSS(跨站脚本)

发布于 2024-11-28 23:00:53 字数 496 浏览 4 评论 0原文

我有一个网站,允许通过 TinyMCE 富编辑器控件输入 HTML。其目的是允许用户使用 HTML 格式化文本。

然后,该用户输入的内容被输出到系统的其他用户。

然而,这意味着有人可以将 JavaScript 插入 HTML 中,以便对系统的其他用户执行 XSS 攻击。

从 HTML 字符串中过滤掉 JavaScript 代码的最佳方法是什么?

如果我对

有没有一种万无一失的方法来编写所有 JavaScript 代码,同时保持 HTML 的其余部分不变?

对于我的特定实现,我使用 C#

I have a website that allows to enter HTML through a TinyMCE rich editor control. It's purpose is to allow users to format text using HTML.

This user entered content is then outputted to other users of the system.

However this means someone could insert JavaScript into the HTML in order to perform a XSS attack on other users of the system.

What is the best way to filter out JavaScript code from a HTML string?

If I perform a Regular Expression check for <SCRIPT> tags it's a good start, but an evil doer could still attach JavaScript to the onclick attribute of a tag.

Is there a fool-proof way to script out all JavaScript code, whilst leaving the rest of the HTML untouched?

For my particular implementation, I'm using C#

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

不离久伴 2024-12-05 23:00:53

Peter,我想向您介绍两个安全概念;

黑名单 - 禁止您知道不好的事情。

白名单 - 允许您知道好的事情。

虽然两者都有其用途,但黑名单在设计上是不安全的。

你问的其实是黑名单。如果必须有

另一方面,白名单允许您指定您允许的确切条件。

例如,您将具有以下规则:

  • 只允许这些标签:b、i、u、img
  • 只允许这些属性:src、href、style

这只是理论。在实践中,您必须相应地解析 HTML,因此需要合适的 HTML 解析器。

Peter, I'd like to introduce you to two concepts in security;

Blacklisting - Disallow things you know are bad.

Whitelisting - Allow things you know are good.

While both have their uses, blacklisting is insecure by design.

What you are asking, is in fact blacklisting. If there had to be an alternative to <script> (such as <img src="bad" onerror="hack()"/>), you won't be able to avoid this issue.

Whitelisting, on the other hand, allows you to specify the exact conditions you are allowing.

For example, you would have the following rules:

  • allow only these tags: b, i, u, img
  • allow only these attributes: src, href, style

That is just the theory. In practice, you must parse the HTML accordingly, hence the need of a proper HTML parser.

第七度阳光i 2024-12-05 23:00:53

微软已经制作了自己的反XSS库,Microsoft Anti-Cross Site Scripting Library V4.0

Microsoft 反跨站脚本库 V4.0 (AntiXSS V4.0) 是一个编码库,旨在帮助开发人员保护其基于 ASP.NET Web 的应用程序免受 XSS 攻击。它与大多数编码库的不同之处在于它使用白名单技术(有时称为包含原则)来提供针对 XSS 攻击的保护。此方法的工作原理是首先定义有效或允许的字符集,并对该集之外的任何内容(无效字符或潜在攻击)进行编码。与其他编码方案相比,白名单方法具有多种优势。此版本的 Microsoft 反跨站脚本库的新功能包括: - 用于 HTML 和 XML 编码的可自定义安全列表 - 性能改进 - 支持中等信任度 ASP.NET 应用程序 - HTML 命名实体支持 - 无效 Unicode 检测 - 改进的代理项HTML 和 XML 编码的字符支持 - LDAP 编码改进 - application/x-www-form-urlencoded 编码支持

它使用白名单方法来剔除潜在的 XSS 内容。

以下是一些与 AntiXSS 相关的链接:

Microsoft have produced their own anti-XSS library, Microsoft Anti-Cross Site Scripting Library V4.0:

The Microsoft Anti-Cross Site Scripting Library V4.0 (AntiXSS V4.0) is an encoding library designed to help developers protect their ASP.NET web-based applications from XSS attacks. It differs from most encoding libraries in that it uses the white-listing technique -- sometimes referred to as the principle of inclusions -- to provide protection against XSS attacks. This approach works by first defining a valid or allowable set of characters, and encodes anything outside this set (invalid characters or potential attacks). The white-listing approach provides several advantages over other encoding schemes. New features in this version of the Microsoft Anti-Cross Site Scripting Library include:- A customizable safe list for HTML and XML encoding- Performance improvements- Support for Medium Trust ASP.NET applications- HTML Named Entity Support- Invalid Unicode detection- Improved Surrogate Character Support for HTML and XML encoding- LDAP Encoding Improvements- application/x-www-form-urlencoded encoding support

It uses a whitelist approach to strip out potential XSS content.

Here are some relevant links related to AntiXSS:

ㄖ落Θ余辉 2024-12-05 23:00:53

如果您想允许某些 HTML 但不是全部,您应该使用 OWASP AntiSamy 之类的工具,它允许您针对允许的标签和属性构建白名单策略。

HTMLPurifier 也可能是一种替代方案。

最重要的是,它是一种白名单方法,因为 HTML5 中一直在添加新的属性和事件,因此任何黑名单都会在短时间内失败,并且了解所有“坏”属性也很困难。

编辑:哦,正则表达式在这里有点难做。 HTML 可以有多种不同的格式。标签可以不闭合,属性可以带或不带引号(单引号或双引号)开头,标签内可以有换行符和各种空格等等。我会依赖像我上面提到的那样经过良好测试的库。

If you want to allow some HTML but not all, you should use something like OWASP AntiSamy, which allows you to build a whitelisted policy over which tags and attributes you allow.

HTMLPurifier might also be an alternative.

It's of key importance that it is a whitelist approach, as new attributes and events are added to HTML5 all the time, so any blacklisting would fail within short time, and knowing all "bad" attributes is also difficult.

Edit: Oh, and regex is a bit hard to do here. HTML can have lots of different formats. Tags can be unclosed, attributes can start with or without quotes (single or double), you can have line breaks and all kinds of spaces within the tags to name a few issues. I would rely on a welltested library like the ones I mentioned above.

岁月染过的梦 2024-12-05 23:00:53

正则表达式是不适合这项工作的工具,您需要一个真正的 HTML 解析器,否则事情会变得很糟糕。您需要解析 HTML 字符串,然后删除除允许的元素和属性之外的所有元素和属性(白名单方法,黑名单本质上是不安全的)。您可以使用Mozilla 使用的列表< /a> 作为起点。那里还有一个采用 URL 值的属性列表 - 您需要验证这些是相对 URL 还是使用允许的协议(通常只有 http:/https: /ftp:,特别是没有 javascript:data:)。一旦您删除了所有不允许的内容,您就可以将数据序列化回 HTML - 现在您就可以安全地在网页上插入一些内容了。

Regular expressions are the wrong tool for the job, you need a real HTML parser or things will turn bad. You need to parse the HTML string and then remove all elements and attributes but the allowed ones (whitelist approach, blacklists are inherently insecure). You can take the lists used by Mozilla as a starting point. There you also have a list of attributes that take URL values - you need to verify that these are either relative URLs or use an allowed protocol (typically only http:/https:/ftp:, in particular no javascript: or data:). Once you've removed everything that isn't allowed you serialize your data back to HTML - now you have something that is safe to insert on your web page.

何处潇湘 2024-12-05 23:00:53

目前,对于 .NET 4.x,我使用 microsoft 的 AntiXss。对于.Net Core,我相信nuget上有很多关于AntiXSS的库。

编码:

Microsoft.Security.Application.Encoder.JavaScriptEncode("your string");

旧答案:
我尝试像这样替换标签元素格式:

public class Utility
{
    public static string PreventXSS(string sInput) {
        if (sInput == null)
            return string.Empty;
        string sResult = string.Empty;
        sResult = Regex.Replace(sInput, "<", "< ");
        sResult = Regex.Replace(sResult, @"<\s*", "< ");
        return sResult;
    }
}

保存到数据库之前的用法:

    string sResultNoXSS = Utility.PreventXSS(varName)

我测试过我有输入数据,例如:

<script>alert('hello XSS')</script>

在此处输入图像描述

它将在浏览器上运行。添加Anti XSS后,上面的代码将是:(

< script>alert('hello XSS')< /script>

<后面有一个空格)

结果,脚本将无法在浏览器上运行。

Currently, for .NET 4.x I use AntiXss by microsoft. For .Net Core I believe there are so many libaries at nuget about AntiXSS.

To Encode :

Microsoft.Security.Application.Encoder.JavaScriptEncode("your string");

Old answer:
I try to replace tag element format like this:

public class Utility
{
    public static string PreventXSS(string sInput) {
        if (sInput == null)
            return string.Empty;
        string sResult = string.Empty;
        sResult = Regex.Replace(sInput, "<", "< ");
        sResult = Regex.Replace(sResult, @"<\s*", "< ");
        return sResult;
    }
}

Usage before save to db:

    string sResultNoXSS = Utility.PreventXSS(varName)

I have test that I have input data like :

<script>alert('hello XSS')</script>

enter image description here

it will be run on browser. After I add Anti XSS the code above will be:

< script>alert('hello XSS')< /script>

(There is a space after <)

And the result, the script won't be run on browser.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文