检测字符串输入是否包含 HTML 的正确方法是什么?

发布于 2024-12-19 22:38:39 字数 2257 浏览 6 评论 0 原文

当在表单上接收用户输入时,我想检测“用户名”或“地址”等字段是否不包含在 XML(RSS 提要)或 (X)HTML(显示时)中具有特殊含义的标记。

那么,检测输入的输入是否不包含 HTML 和 XML 上下文中的任何特殊字符,哪一个是正确的方法呢?

if (mb_strpos($data, '<') === FALSE AND mb_strpos($data, '>') === FALSE)

或者

if (htmlspecialchars($data, ENT_NOQUOTES, 'UTF-8') === $data)

或者

if (preg_match("/[^\p{L}\-.']/u", $text)) // problem: also caches symbols

我是否错过了其他任何内容,例如字节序列或其他围绕“javascript:”等内容获取标记标签的棘手方法?据我所知,所有 XSS 和 CSFR 攻击 都需要 << /code> 或 > 围绕值以使浏览器执行代码(至少从 Internet Explorer 6 或更高版本) - 这是正确的吗?

我并不是在寻找减少或过滤输入的东西。我只是想找到在 XML 或 HTML 上下文中使用时的危险字符序列。 (strip_tags() 非常不安全。正如手册所说,它不会检查格式错误的 HTML。)

更新

我想我需要澄清一下,有很多人误认为 这个问题是关于通过“转义”或“过滤”危险字符实现基本安全的问题。这不是那个问题,并且给出的大多数简单答案无论如何都无法解决该问题。

更新 2:示例

  • 用户提交输入
  • if (mb_strpos($data, '<') === FALSE AND mb_strpos($data, '>') === FALSE)
  • 保存它

我现在 数据在我的应用程序中,我用它做了两件事 - 1)以 HTML 等格式显示 - 或 2)在格式元素内显示以进行编辑。

第一个在 XML 和 HTML 上下文中是安全的

'

第二种形式更危险,但它仍然应该是安全的:

更新 3:工作代码

您可以下载 我创建的要点 并将代码作为文本或 HTML 响应运行以查看我在说什么。这个简单的检查通过了 http://ha.ckers.org XSS Cheat Sheet,我但找不到任何可以实现的东西。 (我忽略 Internet Explorer 6 及更低版本)。

我开始了另一项赏金,以奖励那些能够展示此方法的问题或其实现中的弱点的人。

更新 4:询问 DOM

这是我们想要保护的 DOM - 那么为什么不直接询问它? Timur 的回答导致了这一点:

function not_markup($string)
{
    libxml_use_internal_errors(true);
    if ($xml = simplexml_load_string("<root>$string</root>"))
    {
        return $xml->children()->count() === 0;
    }
}

if (not_markup($_POST['title'])) ...

When receiving user input on forms I want to detect whether fields like "username" or "address" does not contain markup that has a special meaning in XML (RSS feeds) or (X)HTML (when displayed).

So which of these is the correct way to detect whether the input entered doesn't contain any special characters in HTML and XML context?

if (mb_strpos($data, '<') === FALSE AND mb_strpos($data, '>') === FALSE)

or

if (htmlspecialchars($data, ENT_NOQUOTES, 'UTF-8') === $data)

or

if (preg_match("/[^\p{L}\-.']/u", $text)) // problem: also caches symbols

Have I missed anything else,like byte sequences or other tricky ways to get markup tags around things like "javascript:"? As far as I'm aware, all XSS and CSFR attacks require < or > around the values to get the browser to execute the code (well at least from Internet Explorer 6 or later anyway) - is this correct?

I am not looking for something to reduce or filter input. I just want to locate dangerous character sequences when used in XML or HTML context. (strip_tags() is horribly unsafe. As the manual says, it doesn't check for malformed HTML.)

Update

I think I need to clarify that there are a lot people mistaking this question for a question about basic security via "escaping" or "filtering" dangerous characters. This is not that question, and most of the simple answers given wouldn't solve that problem anyway.

Update 2: Example

  • User submits input
  • if (mb_strpos($data, '<') === FALSE AND mb_strpos($data, '>') === FALSE)
  • I save it

Now that the data is in my application I do two things with it - 1) display in a format like HTML - or 2) display inside a format element for editing.

The first one is safe in XML and HTML context

<h2><?php print $input; ?></h2>'
<xml><item><?php print $input; ?></item></xml>

The second form is more dangerous, but it should still be safe:

<input value="<?php print htmlspecialchars($input, ENT_QUOTES, 'UTF-8');?>">

Update 3: Working Code

You can download the gist I created and run the code as a text or HTML response to see what I'm talking about. This simple check passes the http://ha.ckers.org XSS Cheat Sheet, and I can't find anything that makes it though. (I'm ignoring Internet Explorer 6 and below).

I started another bounty to award someone that can show a problem with this approach or a weakness in its implementation.

Update 4: Ask a DOM

It's the DOM that we want to protect - so why not just ask it? Timur's answer lead to this:

function not_markup($string)
{
    libxml_use_internal_errors(true);
    if ($xml = simplexml_load_string("<root>$string</root>"))
    {
        return $xml->children()->count() === 0;
    }
}

if (not_markup($_POST['title'])) ...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

初见你 2024-12-26 22:38:39

我认为您不需要实现一个巨大的算法来检查字符串是否具有不安全的数据 - 过滤器和正则表达式可以完成这项工作。但是,如果您需要更复杂的检查,也许这会满足您的需求:

<?php
$strings = array();
$strings[] = <<<EOD
    ';alert(String.fromCharCode(88,83,83))//\';alert(String.fromCharCode(88,83,83))//";alert(String.fromCharCode(88,83,83))//\";alert(String.fromCharCode(88,83,83))//--></SCRIPT>">'><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT>
EOD;
$strings[] = <<<EOD
    '';!--"<XSS>=&{()}
EOD;
$strings[] = <<<EOD
    <SCRIPT SRC=http://ha.ckers.org/xss.js></SCRIPT>
EOD;
$strings[] = <<<EOD
    This is a safe text
EOD;
$strings[] = <<<EOD
    <IMG SRC="javascript:alert('XSS');">
EOD;
$strings[] = <<<EOD
    <IMG SRC=javascript:alert('XSS')>
EOD;
$strings[] = <<<EOD
    <IMG SRC=javascript:alert('XSS')>
EOD;
$strings[] = <<<EOD
    perl -e 'print "<IMG SRC=java\0script:alert(\"XSS\")>";' > out
EOD;
$strings[] = <<<EOD
    <SCRIPT/XSS SRC="http://ha.ckers.org/xss.js"></SCRIPT>
EOD;
$strings[] = <<<EOD
    </TITLE><SCRIPT>alert("XSS");</SCRIPT>
EOD;



libxml_use_internal_errors(true);
$sourceXML = '<root><element>value</element></root>';
$sourceXMLDocument = simplexml_load_string($sourceXML);
$sourceCount = $sourceXMLDocument->children()->count();

foreach( $strings as $string ){
    $unsafe = false;
    $XML = '<root><element>'.$string.'</element></root>';
    $XMLDocument = simplexml_load_string($XML);
    if( $XMLDocument===false ){
        $unsafe = true;
    }else{

        $count = $XMLDocument->children()->count();
        if( $count!=$sourceCount ){
            $unsafe = true;
        }
    }

    echo ($unsafe?'Unsafe':'Safe').': <pre>'.htmlspecialchars($string,ENT_QUOTES,'utf-8').'</pre><br />'."\n";
}
?>

I don't think you need to implement a huge algorithm to check if string has unsafe data - filters and regular expressions do the work. But, if you need a more complex check, maybe this will fit your needs:

<?php
$strings = array();
$strings[] = <<<EOD
    ';alert(String.fromCharCode(88,83,83))//\';alert(String.fromCharCode(88,83,83))//";alert(String.fromCharCode(88,83,83))//\";alert(String.fromCharCode(88,83,83))//--></SCRIPT>">'><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT>
EOD;
$strings[] = <<<EOD
    '';!--"<XSS>=&{()}
EOD;
$strings[] = <<<EOD
    <SCRIPT SRC=http://ha.ckers.org/xss.js></SCRIPT>
EOD;
$strings[] = <<<EOD
    This is a safe text
EOD;
$strings[] = <<<EOD
    <IMG SRC="javascript:alert('XSS');">
EOD;
$strings[] = <<<EOD
    <IMG SRC=javascript:alert('XSS')>
EOD;
$strings[] = <<<EOD
    <IMG SRC=javascript:alert('XSS')>
EOD;
$strings[] = <<<EOD
    perl -e 'print "<IMG SRC=java\0script:alert(\"XSS\")>";' > out
EOD;
$strings[] = <<<EOD
    <SCRIPT/XSS SRC="http://ha.ckers.org/xss.js"></SCRIPT>
EOD;
$strings[] = <<<EOD
    </TITLE><SCRIPT>alert("XSS");</SCRIPT>
EOD;



libxml_use_internal_errors(true);
$sourceXML = '<root><element>value</element></root>';
$sourceXMLDocument = simplexml_load_string($sourceXML);
$sourceCount = $sourceXMLDocument->children()->count();

foreach( $strings as $string ){
    $unsafe = false;
    $XML = '<root><element>'.$string.'</element></root>';
    $XMLDocument = simplexml_load_string($XML);
    if( $XMLDocument===false ){
        $unsafe = true;
    }else{

        $count = $XMLDocument->children()->count();
        if( $count!=$sourceCount ){
            $unsafe = true;
        }
    }

    echo ($unsafe?'Unsafe':'Safe').': <pre>'.htmlspecialchars($string,ENT_QUOTES,'utf-8').'</pre><br />'."\n";
}
?>
时光无声 2024-12-26 22:38:39

在上面的评论中,您写道:

只需阻止浏览器将字符串视为标记即可。

这是一个与标题中的问题完全不同的问题。标题中的方法通常是错误的。删除标签只会破坏输入并可能导致数据丢失。曾经尝试过在剥离标签的博客上谈论 HTML 吗?令人沮丧。

通常正确的解决方案是按照您在评论中所说的那样进行 - 阻止浏览器将字符串视为标记。从字面上看,这是不可能的。您所做的是将内容编码为 HTML。

考虑以下数据:

<strong>Test</strong>

现在,您可以通过两种方式之一来查看此数据。您可以将其视为文字数据 - 字符序列。您可以将其视为包含强烈强调文本的 HTML 标记。

如果您只是将其转储到 HTML 文档中,那么您就将其视为 HTML。在这种情况下,您不能将其视为文字数据。您需要的是能够输出文字数据的 HTML。您需要将其编码为 HTML。

您的问题不是 HTML 太多,而是 HTML 太少。当您输出 < 时,您正在 HTML 上下文中输出原始数据。您需要将其转换为 <,这是该数据在输出之前的 HTML 表示形式。

PHP 提供了一些不同的选项来执行此操作。最直接的就是使用 htmlspecialchars() 将其转换为HTML,然后 nl2br() 将换行符转换为
元素。

In a comment above, you wrote:

Just stop the browser from treating the string as markup.

This is an entirely different problem to the one in the title. The approach in the title is usually wrong. Stripping out tags just mangles input and can lead to data loss. Ever tried to talk about HTML on a blog that strips tags? Frustrating.

The solution that is usually the correct one is to do as you said in your comment - to stop the browser from treating the string as markup. This - literally taken - is not possible. What you do instead is encode the content as HTML.

Consider the following data:

<strong>Test</strong>

Now, you can look at this one of two ways. You can look at it as literal data - a sequence of characters. You can look at it as HTML - markup that includes strongly emphasises text.

If you just dump that out into an HTML document, you are treating it as HTML. You can't treat it as literal data in that context. What you need is HTML that will output the literal data. You need to encode it as HTML.

Your problem is not that you have too much HTML - it's that you have too little. When you output <, you are outputting raw data in an HTML context. You need to convert it to <, which is the HTML representation of that data before outputting it.

PHP offers a few different options for doing this. The most direct is to use htmlspecialchars() to convert it into HTML, and then nl2br() to convert the line breaks into <br> elements.

安静 2024-12-26 22:38:39

如果您只是“寻找 print '

' 的保护” '

'",那么是的,至少是
第二种方法就足够了,因为它检查该值是否会被解释为标记(如果不是)
逃脱了。 (在这种情况下,$name出现的区域是元素内容,并且只有字符&< >> 当它们出现在元素内容中时具有特殊含义。)(对于 href 和类似属性,检查“JavaScript:”可能是必要的,但正如您在评论中所述,这不是目标。)

对于官方来源,我可以参考XML 规范

  • 第3.1节中的内容制作:这里,内容由元素、CDATA部分、处理指令和注释组成(必须以<),参考文献(必须以 & 开头)和字符数据(包含任何其他合法字符)。 (虽然前导 > 在元素内容中被视为字符数据,但很多人通常将其与 < 一起转义,将其视为特殊的比抱歉更好.)

  • 第 2.3 节中的属性值生成:有效属性值由任一引用组成(其中必须以 & 开头)或字符数据(包含任何其他合法字符,但不包括 < 或用于包装属性值的引号符号)。如果除了元素内容之外,您还需要在属性中放置字符串输入,则除了 之外还需要检查字符 "' >&<,可能还有 >(以及 XML 中的其他非法字符)。

  • 第 2.2 节:定义 XML 中合法的 Unicode 代码点。特别是,null 在 XML 文档中是非法的,并且可能无法在 HTML 中正确显示。

HTML5(最新工作草案,这是一项正在进行的工作,描述了一个非常详细的解析
HTML 文档的算法:

如果字符串输入将被放置在属性值中(除非放置它们只是为了显示目的),但需要记住其他注意事项,例如,HTML 4 指定:

用户代理应按如下方式解释属性值:

  • 用字符替换字符实体,
  • 忽略换行,
  • 将每个回车符或制表符替换为一个空格。

用户代理可能会忽略 CDATA 中的前导和尾随空格
属性值[.]

属性值规范化也在 XML 中指定
规范,但显然不在 HTML5 中。


编辑(2019 年 4 月 25 日):另外,请对包含

  • 空代码点(因为它可能会在某些地方导致解析错误,如 HTML5 规范中指定)或
  • XML 中非法的任何代码点(因为它将在读取 XML 文档时导致解析错误),

...假设 htmlspecialchars 尚未转义这些代码点。

If you're just "looking for protection for print '<h3>' . $name . '</h3>'", then yes, at least the
second approach is adequate, since it checks whether the value would be interpreted as markup if it weren't
escaped. (In this case, the area where $name would appear is element content, and only the characters &, <, and > have special meaning when they appear in element content.) (For href and similar attributes, the check for "JavaScript: " may be necessary, but as you stated in a comment, that isn't a goal.)

For official sources, I can refer to the XML specification:

  • Content production in section 3.1: Here, content consists of elements, CDATA sections, processing instructions, and comments (which must begin with <), references (which must begin with &), and character data (which contains any other legal character). (Although a leading > is treated as character data in element content, many people usually escape it along with <, and it's better safe than sorry to treat it as special.)

  • Attribute value production in section 2.3: A valid attribute value consists of either references (which must begin with &) or character data (which contains any other legal character, but not < or the quote symbol used to wrap the attribute value). If you need to place string inputs in attributes in addition to element content, the characters " and ' need to be checked in addition to &, <, and possibly > (and other characters illegal in XML).

  • Section 2.2: Defines what Unicode code points are legal in XML. In particular, null is illegal in an XML document and may not display properly in HTML.

HTML5 (the latest working draft, which is a work in progress, describes a very elaborate parsing
algorithm for HTML documents:

  • Element content corresponds to the "data state" in the parsing algorithm.
    Here, the string input should not contain a null character, < (which begins a new tag), or &
    (which begins a character reference).
  • Attribute values correspond to the "before attribute value state"
    in the parsing algorithm.
    For simplicity, we assume the attribute value is wrapped in double quotation marks. In that case, the parser moves to the
    "attribute value (double-quoted) state".
    In this case, the string input should not contain a null character, " (which ends the attribute value), or & (which begins a character reference).

If string inputs are to be placed in attribute values (unless placing them there is solely for display purposes), there are additional considerations to keep in mind. For example, HTML 4 specifies:

User agents should interpret attribute values as follows:

  • Replace character entities with characters,
  • Ignore line feeds,
  • Replace each carriage return or tab with a single space.

User agents may ignore leading and trailing white space in CDATA
attribute values[.]

Attribute value normalization is also specified in the XML
specification, but apparently not in HTML5.


EDIT (Apr. 25, 2019): Also, be suspicious of inputs containing—

  • the null code point (as it can cause parse errors in certain places, as specified in the HTML5 specification), or
  • any code point illegal in XML (as it will cause parse errors upon reading the XML document),

...assuming htmlspecialchars doesn't escape those code points already.

地狱即天堂 2024-12-26 22:38:39

HTML Purifier 做得很好,而且很容易实现。您还可以使用 Zend Framework 过滤器,例如 Zend_Filter_StripTags。

HTML Purifier 不仅仅修复 HTML

HTML Purifier does a good job and is very easy to implement. You could also use a Zend Framework filter like Zend_Filter_StripTags.

HTML Purifier doesn't just fix HTML.

盛夏已如深秋| 2024-12-26 22:38:39

我想你回答了你自己的问题。函数 htmlspecialchars() 完全可以满足您的需要,但在将用户输入写入页面之前不应使用它。要将其存储在数据库中,还有其他函数,例如 mysqli_real_escape_string() 。

根据经验,对于给定的目标系统,可以说仅在需要时才转义用户输入:

  1. 转义用户输入通常意味着原始数据的丢失,并且不同的目标系统(HTML 输出/SQL/执行)需要不同的逃避。他们甚至可能互相冲突。
  2. 无论如何,您必须为了给定的目的而转义数据,总是。您甚至不应该信任数据库中的条目。因此,在读取用户输入时进行转义并没有任何大的优势,但双重转义可能会导致无效数据。

与转义相比,尽早验证内容是一件好事。如果需要整数,则仅接受整数,否则拒绝用户输入。

I think you answered your own question. The function htmlspecialchars() does exactly what you need, but you should not use it until you write the user input to a page. To store it in a database there are other functions, like mysqli_real_escape_string().

As a rule of thumb, one can say that you should escape user input only when needed, for the given target system:

  1. Escaping user input often means a loss of the original data, and different target systems (HTML output / SQL / execution) need different escaping. They can even conflict with each other.
  2. You have to escape the data for the given purpose anyway, always. You should not trust even the entries from your database. So escaping when reading from user input does not have any big advantage, but double escaping can lead to invalid data.

In contrast to escaping, validating the content is a good thing to do early. If you expect an integer, only accept integers, otherwise refuse the user input.

不知所踪 2024-12-26 22:38:39

检测字符串输入是否包含HTML标签的正确方法,
或任何其他在 XML 或 (X)HTML 中显示时具有特殊含义的标记(作为实体除外)只是

if (mb_strpos($data, '<') === FALSE AND mb_strpos($ data, '>') === FALSE)

你是对的!所有 XSS 和 CSFR 攻击都需要 <或>围绕值让浏览器执行代码(至少从 IE6+ 开始)。

考虑到给定的输出上下文,这足以安全地以 HTML 等格式显示:

当然,如果我们输入中有任何实体,例如 á,浏览器不会将其输出为 < code>á,但作为 á,除非我们在输出时使用像 htmlspecialchars 这样的函数。在这种情况下,即使 <> 也是安全的。

在使用字符串输入作为属性值的情况下,安全性取决于属性。

如果该属性是输入值,我们必须引用它并使用htmlspecialchars之类的函数,以便将相同的内容返回进行编辑。

同样,即使是 <> 字符在这里是安全的。

我们可以得出这样的结论:如果我们始终使用 htmlspecialchars 来输出输入,并且我们的上下文始终适合上述情况,那么我们就不必对输入进行任何类型的检测和拒绝(或同样安全的)。

[我们还有多种方法将其安全地存储在数据库中,防止 SQL 漏洞。]

如果用户希望他的“用户名”是 & 该怎么办?不是 &?它不包含 < 也不包含 >...我们会检测并拒绝它吗?我们会接受吗?我们将如何展示它? (此输入在新的赏金中给出了有趣的结果!)

最后,如果我们的上下文扩展,并且我们将使用字符串输入作为锚点 href,那么我们的整个方法会突然发生巨大变化。但问题中不包括这种情况。

(值得一提的是,即使使用 htmlspecialchars ,如果每个步骤的字符编码不同,字符串输入的输出也可能不同。)

The correct way to detect whether string inputs contain HTML tags,
or any other markup that has a special meaning in XML or (X)HTML when displayed (other than being an entity) is simply

if (mb_strpos($data, '<') === FALSE AND mb_strpos($data, '>') === FALSE)

You are correct! All XSS and CSFR attacks require < or > around the values to get the browser to execute the code (at least from IE6+).

Considering the output context given, this is sufficient to safely display in a format like HTML:

<h2><?php print $input; ?></h2> <xml><item><?php print $input; ?></item></xml>

Of course, if we have any entity in the input, like á, a browser will not output it as á, but as á, unless we use a function like htmlspecialchars when doing the output. In this case, even the < and > would be also safe.

In the case of using the string input as the value of an attribute, the safety depends on the attribute.

If the attribute is an input value, we must quote it and use a function like htmlspecialchars in order to have the same content back for editing.

<input value="<?php print htmlspecialchars($input, ENT_QUOTES, 'UTF-8');?>">

Again, even the < and > characters would be safe here.

We may conclude that we do not have to do any kind of detection and rejection of the input, if we will always use htmlspecialchars to output it, and our context will fit always the above cases (or equally safe ones).

[And we also have a number of ways to safely store it in the database, preventing SQL exploits.]

What if the user wants his "username" to be & is not an &? It does not contain < nor >... will we detect and reject it? Will we accept it? How will we display it? (This input gives interesting results in the new bounty!)

Finally, if our context expands, and we will use the string input as an anchor href, then our whole approach suddenly changes dramatically. But this scenario is not included in the question.

(It worths mentioning that even using htmlspecialchars the output of a string input may differ if the character encodings are different on each step.)

寒冷纷飞旳雪 2024-12-26 22:38:39

我当然不是安全专家,但根据我收集的信息,您建议的类似内容

if (htmlspecialchars($data, ENT_NOQUOTES, 'UTF-8') === $data)

应该可以防止您传递受污染的字符串,前提是您的编码就在那里。

不需要“<”的 XSS 攻击或“>”依赖于在 JavaScript 块中处理的字符串,从我的阅读方式来看您的问题不是您在这种情况下所关心的。

I am certainly not a security expert, but from what I gather something like your suggested

if (htmlspecialchars($data, ENT_NOQUOTES, 'UTF-8') === $data)

should work to prevent you from passing on contaminated strings, given you got your encoding right there.

XSS attacks that don't require '<' or '>' rely on the string being handled in a JavaScript block right there and then, which, from how I read your question, is not what you are concerned with in this situation.

风筝有风,海豚有海 2024-12-26 22:38:39

我建议您查看 CodeIgniterxss_clean 函数>。我知道你不想清洁、消毒或过滤任何东西。您只想“检测不良行为”并拒绝它。这正是我建议您查看此函数代码的原因。

IMO,我们可以在那里找到深入而强大的 XSS 漏洞知识,包括您所掌握的所有知识想要和需要你的问题。

然后,我对您的简短/直接回答是:

if (xss_clean($data) === $data)

当然,现在,您不需要仅仅因为需要这个单一功能就需要使用整个 CodeIgniter 框架。但我相信您可能想要获取整个 CI_Security 类(位于 /system/core/Security.php)并进行一些修改以消除其他依赖项。

正如您将看到的,xss_clean 代码非常复杂,就像 XSS 漏洞一样,我只是相信它并且不会尝试“重新发明这个轮子”...恕我直言,您无法得到只需检测十几个字符即可消除 XSS 漏洞。

I suggest you to take a look at the xss_clean function from CodeIgniter. I know you don't want to clean, sanitize, or filter anything. You just want to "detect bad behaviour" and reject it. That's exactly why I recommend you to look at this function code.

IMO, we can find a deep and strong XSS vulnerability knowledge there, including all the knowledge you want and need with your question.

Then, my short / direct answer to you would be:

if (xss_clean($data) === $data)

Now, you don't need to use the whole CodeIgniter framework just because you need this single function, of course. But I believe you may want to grab the whole CI_Security class (at /system/core/Security.php) and do a few modifications to eliminate other dependencies.

As you will see, xss_clean code is quite complex, as XSS vulnerabilities really are, and I would just trust it and do not try to "reinvent this wheel"... IMHO, you can't get rid of XSS vulnerabilities by merely detecting a dozen of characters.

痴意少年 2024-12-26 22:38:39

filter_input + FILTER_SANITIZE_STRING(有很多标志可供选择)

:- http ://www.php.net/manual/en/filter.filters.sanitize.php

filter_input + FILTER_SANITIZE_STRING (there are lots of flag you can chose from)

:- http://www.php.net/manual/en/filter.filters.sanitize.php

澜川若宁 2024-12-26 22:38:39

如果您知道允许的字符集,则可以使用正则表达式。如果用户名中存在不允许的字符,则抛出错误:

[a-zA-Z0-9_.-]

在此处测试您的正则表达式: http://www.perlfect.com/articles/regextutor.shtml

<?php
$username = "abcdef";
$pattern = '/[a-zA-Z0-9_.-]/';
preg_match($pattern, $username, $matches);
print_r($matches);
?>

You could use a regular expression if you know the character sets that are allowed. IF a character is in the username that isn't allowed then throw an error:

[a-zA-Z0-9_.-]

Test your regular expressions here: http://www.perlfect.com/articles/regextutor.shtml

<?php
$username = "abcdef";
$pattern = '/[a-zA-Z0-9_.-]/';
preg_match($pattern, $username, $matches);
print_r($matches);
?>
太傻旳人生 2024-12-26 22:38:39

如果问题的原因是为了防止XSS,那么有几种方法可以解决XSS问题脆弱性。关于此问题的一个很棒的备忘单是 ha.ckers.org 上的 XSS Cheatsheet

但是,在这种情况下检测是没有用的。您只需要预防,并且在将文本输入保存到数据库之前正确使用 htmlspecialchars/htmlentities 比检测错误输入更快更好。

If the reason of the question is for XSS prevention, there are several ways to explode a XSS vulnerability. A great cheatsheet about this is the XSS Cheatsheet at ha.ckers.org.

But, detection is useless in this case. You only need prevention, and the correct use of htmlspecialchars/htmlentities on your text inputs before saving them to your database is faster and better than detecting bad input.

带上头具痛哭 2024-12-26 22:38:39

正则表达式仍然是解决问题的最有效方法。无论您计划使用或建议使用什么框架,最有效的方法仍然是自定义正则表达式代码。您可以使用正则表达式测试字符串,并使用 htmlcharacter 函数删除(或转换)受影响的部分。
无需安装任何其他框架,或使用一些冗长的应用程序。

Regex is still the most efficient way of solving your problem. It doesn't matter what frameworks you plan to use, or are advised to use, the most efficient way would still be a custom regex code. You can test the string with a regex, and remove (or convert) the affected section using htmlcharacter function.
No need to install any other framework, or use some long-winded application.

川水往事 2024-12-26 22:38:39

您可以使用 strip_tags 函数://en.wikipedia.org/wiki/PHP" rel="nofollow">PHP。此函数将从给定数据中去除 HTML 和 PHP 标签。

例如,$data是保存您的内容的变量,那么您可以这样使用:

if (strlen($data) != strlen(strip_tags($data))){
    return false;
} 
else{
    return true;
}

它将根据原始内容检查剥离的内容。如果两者相等,那么我们可以希望没有任何 HTML 标记,并且它返回true。否则,它会返回 false,因为它发现了一些 HTML 标记。

You can make use of the strip_tags function in PHP. This function will strip HTML and PHP tags from given data.

For example, $data is the variable which holds your content then you can use this like this:

if (strlen($data) != strlen(strip_tags($data))){
    return false;
} 
else{
    return true;
}

It will check stripped content against the original content. If both are equal then we can hope there aren't any HTML tags, and it returns true. Otherwise, it returns false as it found some HTML tags.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文