基于 DOM 的 XSS 攻击和 InnerHTML

发布于 2024-11-08 11:09:32 字数 776 浏览 1 评论 0原文

如何保护以下基于 DOM 的 XSS 攻击？

具体来说，是否有一个 protected() 函数可以使以下内容安全？如果没有，那么还有其他解决方案吗？例如：给div一个id，然后为元素分配一个onclick处理程序

<?php
function protect()
{
   // For non-DOM XSS attacks, hex-encoding all non-alphanumeric characters
   // with ASCII values less than 256 works (ie: \xHH)
   // But is it possible to augment this function to protect against
   // the below DOM based XSS attack?
}
?>

<body>
  <div id="mydiv"></div>
  <script type="text/javascript">
    var xss = "<?php echo protect($_GET["xss"]) ?>";
    $("#mydiv").html("<div onclick='myfunc(\""+xss+"\")'></div>")
  </script>
</body>

我希望得到一个不是“避免使用innerHTML”或“将xss变量正则表达式为[a-zA-Z0-9]”的答案。 .ie：有更通用的解决方案吗？

谢谢

原文

How would one go about securing the below DOM Based XSS attack?

Specifically, is there a protect() function that will make the below safe?
If no, then is there another solution?
eg: Giving the div an id and then later assigning the element an onclick handler

<?php
function protect()
{
   // For non-DOM XSS attacks, hex-encoding all non-alphanumeric characters
   // with ASCII values less than 256 works (ie: \xHH)
   // But is it possible to augment this function to protect against
   // the below DOM based XSS attack?
}
?>

<body>
  <div id="mydiv"></div>
  <script type="text/javascript">
    var xss = "<?php echo protect($_GET["xss"]) ?>";
    $("#mydiv").html("<div onclick='myfunc(\""+xss+"\")'></div>")
  </script>
</body>

I'm hoping for an answer that is not "avoid using innerHTML" or "regex the xss variable to [a-zA-Z0-9]"...ie: is there a more general solution?

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

临走之时 2024-11-15 11:09:32

扩展 Vineet 的回复，这里有一组需要研究的测试用例：

http://ha.ckers.org /xss.html

回复收藏 0 原文

请别遗忘我 2024-11-15 11:09:32

我一直在研究 PHP 的 DOMDocument 和相关类，以期编写一个可以处理此类内容的 HTML 解析器。目前它还处于开发的早期阶段，距离实际使用还差得远，但我的早期实验似乎显示了这个想法的一些前景。

基本上，您将标记加载到 DOMDocument 中，然后遍历树。对于树中的每个节点，您可以根据允许的节点类型列表检查节点类型。如果节点类型不在列表中，则会将其从树中删除。

您可以使用与此类似的方法来找到一段标记中的所有 SCRIPT 标记并将其删除。如果您可以从提供的标记中提取任何嵌入的脚本，那么基于 DOM 的 XSS 就会变得毫无用处。

这是我正在使用的代码，以及处理 StackOverflow 主页的测试用例。就像我说的，它距离生产质量代码还很远，只不过是一个概念证明。尽管如此，我还是希望你觉得它有用。

<?php
class HtmlClean
{
    private $whiteList      = array (
        '#cdata-section', '#comment', '#text', 'a', 'abbr', 'acronym', 'address', 'b', 
        'big', 'blockquote', 'body', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 
        'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em', 'fieldset', 'h1', 'h2', 'h3', 'h4', 
        'h5', 'h6', 'head', 'hr', 'html', 'i', 'img', 'ins', 'kbd', 'li', 'link', 'meta', 
        'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strike', 'strong', 'style', 'sub', 
        'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'title', 'tr', 'tt', 'ul', 
        'var'
    );

    private $attrWhiteList  = array (
        'class', 'id', 'title'
    );

    private $dom            = NULL;

    /**
     * Get current tag whitelist
     * @return array
     */
    public function getWhiteListTags ()
    {
        $this -> whiteList  = array_values ($this -> whiteList);
        return ($this -> whiteList);
    }

    /**
     * Add tag to the whitelist
     * @param string $tagName
     */
    public function addWhiteListTag ($tagName)
    {
        $tagName    = strtolower (trin ($tagName));
        if (!in_array ($tagName, $this -> whiteList))
        {
            $this -> whiteList []   = $tagName;
        }
    }

    /**
     * Remove a tag from the whitelist
     * @param string $tagName
     */
    public function removeWhiteListTag ($tagName)
    {
        if ($index = array_search ($tagName, $this -> whiteList))
        {
            unset ($this -> whiteList [$index]);
        }
    }

    /**
     * Load document markup into the class for cleaning
     * @param string $html The markup to clean
     * @return bool
     */
    public function loadHTML ($html)
    {
        if (!$this -> dom)
        {
            $this -> dom    = new DOMDocument();
        }
        $this -> dom -> preserveWhiteSpace  = false;
        $this -> dom -> formatOutput        = true;
        return $this -> dom -> loadHTML ($html);
    }

    public function outputHtml ()
    {
        $ret    = '';
        if ($this -> dom)
        {
            $ret    = $this -> dom -> saveXML ();
        }
        return ($ret);
    }

    private function cleanAttrs (DOMnode $elem)
    {
        $attrs  = $elem -> attributes;
        $index  = $attrs -> length;
        while (--$index >= 0)
        {
            $attrName   = strtolower ($attrs -> item ($indes) -> name);
            if (!in_array ($attrName, $this -> attrWhiteList))
            {
                $elem -> removeAttribute ($attrName);
            }
        }       
    }

    /**
     * Recursivly remove elements from the DOM that aren't whitelisted
     * @param DOMNode $elem
     * @return array List of elements removed from the DOM
     * @throws Exception If removal of a node failed than an exception is thrown
     */
    private function cleanNodes (DOMNode $elem)
    {
        $removed    = array ();
        if (in_array (strtolower ($elem -> nodeName), $this -> whiteList))
        {
            // Remove non-whitelisted attributes
            if ($elem -> hasAttributes ())
            {
                $this -> cleanAttrs ($elem);
            }
            /*
             * Iterate over the element's children. The reason we go backwards is because
             * going forwards will cause indexes to change when elements get removed
             */
            if ($elem -> hasChildNodes ())
            {
                $children   = $elem -> childNodes;
                $index      = $children -> length;
                while (--$index >= 0)
                {
                    $removed = array_merge ($removed, $this -> cleanNodes ($children -> item ($index)));
                }
            }
        }
        else
        {
            // The element is not on the whitelist, so remove it
            if ($elem -> parentNode -> removeChild ($elem))
            {
                $removed [] = $elem;
            }
            else
            {
                throw new Exception ('Failed to remove node from DOM');
            }
        }
        return ($removed);
    }

    /**
     * Perform the cleaning of the document
     */
    public function clean ()
    {
        $removed    = $this -> cleanNodes ($this -> dom -> getElementsByTagName ('html') -> item (0));
        return ($removed);
    }
}

$test       = file_get_contents( ('http://www.stackoverflow.com/'));
// Windows-stype linebreaks really foul up the works. There's probably a better fix for this
$test       = str_replace (chr (13), '', $test);

$cleaner    = new HtmlClean ();
$cleaner -> loadHTML ($test);

echo ('<h1>Before</h1><pre>' . htmlspecialchars ($cleaner -> outputHtml ()) . '</pre>');

$start      = microtime (true);
$removed    = $cleaner -> clean ();
$cleanTime  = microtime (true) - $start;

echo ('<h1>Removed tag list</h1>');
foreach ($removed as $elem)
{
    var_dump ($elem -> nodeName);
}

echo ('<h1>After</h1><pre>' . htmlspecialchars ($cleaner -> outputHtml ()) . '</pre>');

// benchmark
var_dump ($cleanTime);
?>

I've been playing around with PHP's DOMDocument and related classes with a view to writing a HTML parser that can deal with stuff like this. It's at a very early stage of development at the moment and is nowhere near ready for actual use, but my early experiments seem to show some promise for the idea.

Basically, you load your Markup into a DOMDocument, then traverse the tree. For each node in the tree you check what the node type is against a list of allowed node types. If the node type isn't in the list then it's removed from the tree.

You could use an approach similar to this to locate all SCRIPT tags in a piece of markup and remove them. DOM based XSS is rendered toothless if you can pull any embedded scripts out of the markup you've been provided.

This is the code I'm using, along with a test case that processes the StackOverflow home page. Like I said, it's far from production quality code and is little more than a proof of concept. Still, I hope you find it useful.

<?php
class HtmlClean
{
    private $whiteList      = array (
        '#cdata-section', '#comment', '#text', 'a', 'abbr', 'acronym', 'address', 'b', 
        'big', 'blockquote', 'body', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 
        'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em', 'fieldset', 'h1', 'h2', 'h3', 'h4', 
        'h5', 'h6', 'head', 'hr', 'html', 'i', 'img', 'ins', 'kbd', 'li', 'link', 'meta', 
        'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strike', 'strong', 'style', 'sub', 
        'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'title', 'tr', 'tt', 'ul', 
        'var'
    );

    private $attrWhiteList  = array (
        'class', 'id', 'title'
    );

    private $dom            = NULL;

    /**
     * Get current tag whitelist
     * @return array
     */
    public function getWhiteListTags ()
    {
        $this -> whiteList  = array_values ($this -> whiteList);
        return ($this -> whiteList);
    }

    /**
     * Add tag to the whitelist
     * @param string $tagName
     */
    public function addWhiteListTag ($tagName)
    {
        $tagName    = strtolower (trin ($tagName));
        if (!in_array ($tagName, $this -> whiteList))
        {
            $this -> whiteList []   = $tagName;
        }
    }

    /**
     * Remove a tag from the whitelist
     * @param string $tagName
     */
    public function removeWhiteListTag ($tagName)
    {
        if ($index = array_search ($tagName, $this -> whiteList))
        {
            unset ($this -> whiteList [$index]);
        }
    }

    /**
     * Load document markup into the class for cleaning
     * @param string $html The markup to clean
     * @return bool
     */
    public function loadHTML ($html)
    {
        if (!$this -> dom)
        {
            $this -> dom    = new DOMDocument();
        }
        $this -> dom -> preserveWhiteSpace  = false;
        $this -> dom -> formatOutput        = true;
        return $this -> dom -> loadHTML ($html);
    }

    public function outputHtml ()
    {
        $ret    = '';
        if ($this -> dom)
        {
            $ret    = $this -> dom -> saveXML ();
        }
        return ($ret);
    }

    private function cleanAttrs (DOMnode $elem)
    {
        $attrs  = $elem -> attributes;
        $index  = $attrs -> length;
        while (--$index >= 0)
        {
            $attrName   = strtolower ($attrs -> item ($indes) -> name);
            if (!in_array ($attrName, $this -> attrWhiteList))
            {
                $elem -> removeAttribute ($attrName);
            }
        }       
    }

    /**
     * Recursivly remove elements from the DOM that aren't whitelisted
     * @param DOMNode $elem
     * @return array List of elements removed from the DOM
     * @throws Exception If removal of a node failed than an exception is thrown
     */
    private function cleanNodes (DOMNode $elem)
    {
        $removed    = array ();
        if (in_array (strtolower ($elem -> nodeName), $this -> whiteList))
        {
            // Remove non-whitelisted attributes
            if ($elem -> hasAttributes ())
            {
                $this -> cleanAttrs ($elem);
            }
            /*
             * Iterate over the element's children. The reason we go backwards is because
             * going forwards will cause indexes to change when elements get removed
             */
            if ($elem -> hasChildNodes ())
            {
                $children   = $elem -> childNodes;
                $index      = $children -> length;
                while (--$index >= 0)
                {
                    $removed = array_merge ($removed, $this -> cleanNodes ($children -> item ($index)));
                }
            }
        }
        else
        {
            // The element is not on the whitelist, so remove it
            if ($elem -> parentNode -> removeChild ($elem))
            {
                $removed [] = $elem;
            }
            else
            {
                throw new Exception ('Failed to remove node from DOM');
            }
        }
        return ($removed);
    }

    /**
     * Perform the cleaning of the document
     */
    public function clean ()
    {
        $removed    = $this -> cleanNodes ($this -> dom -> getElementsByTagName ('html') -> item (0));
        return ($removed);
    }
}

$test       = file_get_contents( ('http://www.stackoverflow.com/'));
// Windows-stype linebreaks really foul up the works. There's probably a better fix for this
$test       = str_replace (chr (13), '', $test);

$cleaner    = new HtmlClean ();
$cleaner -> loadHTML ($test);

echo ('<h1>Before</h1><pre>' . htmlspecialchars ($cleaner -> outputHtml ()) . '</pre>');

$start      = microtime (true);
$removed    = $cleaner -> clean ();
$cleanTime  = microtime (true) - $start;

echo ('<h1>Removed tag list</h1>');
foreach ($removed as $elem)
{
    var_dump ($elem -> nodeName);
}

echo ('<h1>After</h1><pre>' . htmlspecialchars ($cleaner -> outputHtml ()) . '</pre>');

// benchmark
var_dump ($cleanTime);
?>

回复收藏 0 原文