如何使用 PHP 清理用户输入?
是否有一个包罗万象的功能,可以很好地清理用户输入的 SQL 注入和 XSS 攻击,同时仍然允许某些类型的 HTML 标签?
Is there a catchall function somewhere that works well for sanitizing user input for SQL injection and XSS attacks, while still allowing certain types of HTML tags?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
没有包罗万象的功能,因为有多个问题需要解决。
1. SQL 注入
如今,一般来说,每个 PHP 项目都应该通过 PHP 使用 准备好的语句数据对象 (PDO) 作为最佳实践,可以防止杂散引用引起的错误,并且是针对注入的全功能解决方案。 它也是最灵活且最可靠的。 访问数据库的安全方式。
查看(唯一正确的)PDO 教程了解您需要了解的几乎所有内容关于 PDO。 (真诚感谢顶级 SO 贡献者 @YourCommonSense,提供了有关该主题的优秀资源。)
2. XSS - 验证 HTML 输入
HTML Purifier 已经存在很长时间了,并且仍在积极更新。 您可以使用它来只允许无害的 HTML,因此生成的代码可以在您的 HTML 页面中使用。 与许多所见即所得编辑器配合得很好,但对于某些用例来说可能很重。
3. XSS - 在输出时清理数据
也没有包罗万象的功能。 对于不同的上下文,您需要不同的转义
htmlspecialchars< /code>
除非数据经过适当的安全处理并且允许显示 HTML。
json_encode
是一种安全的方式提供从 PHP 到 Javascript 的值4. 调用 shell 命令
您是否使用
调用外部 shell 命令exec()
或system()
函数,或
反引号
运算符?** 如果是这样,除了 SQL 注入& XSS 您可能还有一个需要解决的问题,用户在您的服务器上运行恶意命令。 如果您需要,您需要使用escapeshellcmd
喜欢转义整个命令或escapeshellarg
到逃避个人争论。There's no catchall function, because there are multiple concerns to be addressed.
1. SQL Injection
Today, generally, every PHP project should be using prepared statements via PHP Data Objects (PDO) as a best practice, preventing an error from a stray quote as well as a full-featured solution against injection. It's also the most flexible & secure way to access your database.
Check out (The only proper) PDO tutorial for pretty much everything you need to know about PDO. (Sincere thanks to top SO contributor, @YourCommonSense, for this great resource on the subject.)
2. XSS - Validate HTML input
HTML Purifier has been around a long time and is still actively updated. You can use it to only allow harmless HTML, so the resulting code can be used in your HTML pages. Works great with many WYSIWYG editors, but it might be heavy for some use cases.
3. XSS - Sanitize data on the way out
There is no catchall function as well. For different context you need different escaping
htmlspecialchars
unless the data was properly sanitized safe and is allowed to display HTML.json_encode
is a safe way to provide values from PHP to Javascript4. Calling shell commands
Do you call external shell commands using
exec()
orsystem()
functions, or to thebacktick
operator?** If so, in addition to SQL Injection & XSS you might have an additional concern to address, users running malicious commands on your server. You need to useescapeshellcmd
if you'd like to escape the entire command ORescapeshellarg
to escape individual arguments.PHP 5.2 引入了
filter_var
函数。它支持大量的
SANITIZE
、VALIDATE
过滤器。PHP 5.2 introduced the
filter_var
function.It supports a great deal of
SANITIZE
,VALIDATE
filters.认为用户输入可以被过滤是一个常见的误解。 PHP 甚至有一个(现已不复存在的)“功能”,称为 magic-quotes,它是在此基础上构建的主意。 这是无稽之谈。 忘记过滤(或清洁,或任何人们所说的)。
为了避免出现问题,您应该做的事情非常简单:每当您在外部代码中嵌入一段数据时,您必须根据该代码的规则对其进行格式化。 但您必须明白,这些规则可能太复杂,无法手动遵循所有规则。 例如,在 SQL 中,字符串、数字和标识符的规则都是不同的。 为了您的方便,在大多数情况下,都有专门的工具用于此类嵌入。 例如,当某些数据必须在 SQL 查询中使用时,必须通过查询中的参数,使用准备好的语句来完成,而不是直接向 SQL 字符串添加变量。 它会处理所有正确的格式。
另一个例子是 HTML:如果您在 HTML 标记中嵌入字符串,则必须使用
htmlspecialchars< 对其进行转义/代码>
。 这意味着每个
echo
或print
语句都应使用htmlspecialchars
。第三个例子可能是 shell 命令:如果您要将字符串(例如参数)嵌入到外部命令中,并使用
exec
,那么你必须使用<代码>escapeshellcmd和escapeshellarg
。另外,一个非常引人注目的例子是 JSON。 规则如此众多且复杂,您永远无法手动遵循所有规则。 这就是为什么您永远不应该手动创建 JSON 字符串,而应始终使用专用函数
json_encode()
将正确格式化每一位数据。
等等......
唯一需要主动过滤数据的情况是您接受预先格式化的输入。 例如,如果您让用户发布您计划在网站上显示的 HTML 标记。 但是,您应该明智地不惜一切代价避免这种情况,因为无论您过滤得多么好,它始终是一个潜在的安全漏洞。
It's a common misconception that user input can be filtered. PHP even had a (now defunct) "feature", called magic-quotes, that builds on this idea. It's nonsense. Forget about filtering (or cleaning, or whatever people call it).
What you should do, to avoid problems, is quite simple: whenever you embed a piece of data within a foreign code, you must format it according to the rules of that code. But you must understand that such rules could be too complicated to try to follow them all manually. For example, in SQL, rules for strings, numbers and identifiers are all different. For your convenience, in most cases there is a dedicated tool for such embedding. For example, when some data has to be used in the SQL query, instead of adding a variable directly to SQL string, it has to be done though a parameter in the query, using prepared statement. And it will take care of all the proper formatting.
Another example is HTML: If you embed strings within HTML markup, you must escape it with
htmlspecialchars
. This means that every singleecho
orprint
statement should usehtmlspecialchars
.A third example could be shell commands: If you are going to embed strings (such as arguments) to external commands, and call them with
exec
, then you must useescapeshellcmd
andescapeshellarg
.Also, a very compelling example is JSON. The rules are so numerous and complicated that you would never be able to follow them all manually. That's why you should never ever create a JSON string manually, but always use a dedicated function,
json_encode()
that will correctly format every bit of data.And so on and so forth ...
The only case where you need to actively filter data, is if you're accepting preformatted input. For example, if you let your users post HTML markup, that you plan to display on the site. However, you should be wise to avoid this at all cost, since no matter how well you filter it, it will always be a potential security hole.
不要尝试通过清理输入数据来防止 SQL 注入。
相反,不允许在创建 SQL 代码时使用数据。 使用使用绑定变量的准备语句(即在模板查询中使用参数)。 这是保证防止 SQL 注入的唯一方法。
请访问我的网站 http://bobby-tables.com/ 了解有关防止 SQL 注入的更多信息。
Do not try to prevent SQL injection by sanitizing input data.
Instead, do not allow data to be used in creating your SQL code. Use Prepared Statements (i.e. using parameters in a template query) that uses bound variables. It is the only way to be guaranteed against SQL injection.
Please see my website http://bobby-tables.com/ for more about preventing SQL injection.
不可以。如果没有任何数据用途的上下文,您无法一般地过滤数据。 有时您希望将 SQL 查询作为输入,有时您希望将 HTML 作为输入。
您需要过滤白名单上的输入——确保数据符合您期望的某些规范。 然后,您需要在使用它之前对其进行转义,具体取决于您使用它的上下文。
SQL 数据转义过程(以防止 SQL 注入)与 (X)HTML 数据转义过程(以防止 XSS)非常不同。
No. You can't generically filter data without any context of what it's for. Sometimes you'd want to take a SQL query as input and sometimes you'd want to take HTML as input.
You need to filter input on a whitelist -- ensure that the data matches some specification of what you expect. Then you need to escape it before you use it, depending on the context in which you are using it.
The process of escaping data for SQL - to prevent SQL injection - is very different from the process of escaping data for (X)HTML, to prevent XSS.
PHP 有新的漂亮的
filter_input
现在功能,例如,现在有一个内置的FILTER_VALIDATE_EMAIL
类型,可以让您从寻找“终极电子邮件正则表达式”中解放出来,我自己的过滤器类(使用 JavaScript 突出显示错误字段)可以启动通过 ajax 请求或普通形式的 post。 (见下面的例子)
当然,请记住,您还需要根据您使用的数据库类型对 sql 查询进行转义(例如 mysql_real_escape_string() 对于 sql 服务器来说是无用的)。 您可能希望在适当的应用程序层(例如 ORM)自动处理此问题。 另外,如上所述:要输出到 html,请使用其他 php 专用函数,如 htmlspecialchars ;)
要真正允许使用类似剥离的类和/或标签进行 HTML 输入,取决于专用的 xss 验证包之一。 不要编写自己的正则表达式来解析 HTML!
PHP has the new nice
filter_input
functions now, that for instance liberate you from finding 'the ultimate e-mail regex' now that there is a built-inFILTER_VALIDATE_EMAIL
typeMy own filter class (uses JavaScript to highlight faulty fields) can be initiated by either an ajax request or normal form post. (see the example below)
<?
/**
* Pork Formvalidator. validates fields by regexes and can sanitize them. Uses PHP filter_var built-in functions and extra regexes
* @package pork
*/
Of course, keep in mind that you need to do your sql query escaping too depending on what type of db your are using (mysql_real_escape_string() is useless for an sql server for instance). You probably want to handle this automatically at your appropriate application layer like an ORM. Also, as mentioned above: for outputting to html use the other php dedicated functions like htmlspecialchars ;)
For really allowing HTML input with like stripped classes and/or tags depend on one of the dedicated xss validation packages. DO NOT WRITE YOUR OWN REGEXES TO PARSE HTML!
不,那里没有。
首先,SQL注入是一个输入过滤问题,而XSS是一个输出转义问题——所以你甚至不会在代码生命周期中同时执行这两个操作。
基本经验规则
strip_tags( )
过滤掉不需要的 HTMLhtmlspecialchars()
并注意此处的第二个和第三个参数。No, there is not.
First of all, SQL injection is an input filtering problem, and XSS is an output escaping one - so you wouldn't even execute these two operations at the same time in the code lifecycle.
Basic rules of thumb
strip_tags()
to filter out unwanted HTMLhtmlspecialchars()
and be mindful of the 2nd and 3rd parameters here.要解决 XSS 问题,请查看 HTML Purifier。 它具有相当可配置性并且具有良好的记录。
对于SQL注入攻击,解决方案是使用准备好的语句。 PDO 库 和 mysqli 扩展支持这些。
To address the XSS issue, take a look at HTML Purifier. It is fairly configurable and has a decent track record.
As for the SQL injection attacks, the solution is to use prepared statements. The PDO library and mysqli extension support these.
PHP 中安全数据库交互的方法
使用现代版本的 MySQL 和 PHP
1. 显式设置字符集:
MySQLi
PDO
2. 使用预备语句
MySQLi 准备好的语句:
PDO 准备语句:
与 MySQLi 准备语句相比,PDO 支持更多的数据库驱动程序和命名参数:
Methods for safe database interaction in PHP
Using modern versions of MySQL and PHP
1. Set charset explicitly:
MySQLi
PDO
2. Use prepared statements
MySQLi prepared statements:
PDO Prepared Statements:
Compared to MySQLi prepared statements, PDO supports more database drivers and named parameters:
您在这里描述的是两个独立的问题:
1)用户输入应始终被假设为错误的。
使用准备好的语句,或/和使用 mysql_real_escape_string 进行过滤绝对是必须的。
PHP 还内置了 filter_input,这是一个很好的起点。
2)这是一个很大的话题,它取决于输出数据的上下文。 对于 HTML,有一些解决方案,例如 htmlpurifier。
根据经验,始终转义您输出的任何内容。
这两个问题都太大了,无法在一篇文章中讨论,但是有很多文章可以详细介绍:
PHP 输出方法
更安全的 PHP 输出
What you are describing here is two separate issues:
1) User input should always be assumed to be bad.
Using prepared statements, or/and filtering with mysql_real_escape_string is definitely a must.
PHP also has filter_input built in which is a good place to start.
2) This is a large topic, and it depends on the context of the data being output. For HTML there are solutions such as htmlpurifier out there.
as a rule of thumb, always escape anything you output.
Both issues are far too big to go into in a single post, but there are lots of posts which go into more detail:
Methods PHP output
Safer PHP output
在您拥有像
/mypage?id=53
这样的页面并且在 WHERE 子句中使用 id 的特定情况下,有一个技巧可以提供帮助,即确保 id 绝对是整数,如下所示:但当然,这只消除了一种特定的攻击,因此请阅读所有其他答案。 (是的,我知道上面的代码不是很好,但它显示了特定的防御。)
One trick that can help in the specific circumstance where you have a page like
/mypage?id=53
and you use the id in a WHERE clause is to ensure that id definitely is an integer, like so:But of course that only cuts out one specific attack, so read all the other answers. (And yes I know that the code above isn't great, but it shows the specific defence.)
pg_escape_literal() 转义 PHP 的输入
如果您使用 PostgreSQL,则可以使用文档:
If you're using PostgreSQL, the input from PHP can be escaped with
pg_escape_literal()
From the documentation:
你永远不会清理输入。
您总是清理输出。
您为使其安全地包含在 SQL 语句中而对数据应用的转换与您申请包含在 HTML 中的转换完全不同,与您申请包含在 Javascript 中的转换完全不同,与您申请包含在 LDIF 中的转换完全不同:与您应用于包含在 CSS 中的内容完全不同 与您应用于包含在电子邮件中的内容完全不同......
无论如何 验证输入 - 决定是否应该接受它以进行进一步处理或告诉用户这是不可接受的。 但是,在数据即将离开 PHP 之前,不要对数据的表示进行任何更改。
很久以前,有人试图发明一种万能的数据转义机制,我们最终得到了“magic_quotes" 没有正确转义所有输出目标的数据,并导致不同的安装需要不同的代码才能工作。
You never sanitize input.
You always sanitize output.
The transforms you apply to data to make it safe for inclusion in an SQL statement are completely different from those you apply for inclusion in HTML are completely different from those you apply for inclusion in Javascript are completely different from those you apply for inclusion in LDIF are completely different from those you apply to inclusion in CSS are completely different from those you apply to inclusion in an Email....
By all means validate input - decide whether you should accept it for further processing or tell the user it is unacceptable. But don't apply any change to representation of the data until it is about to leave PHP land.
A long time ago someone tried to invent a one-size fits all mechanism for escaping data and we ended up with "magic_quotes" which didn't properly escape data for all output targets and resulted in different installation requiring different code to work.
避免清理输入和转义数据时出现错误的最简单方法是使用 PHP 框架,例如 Symfony、Nette 等或该框架的一部分(模板引擎、数据库层、ORM)。
像 Twig 或 Latte 这样的模板引擎默认打开输出转义 - 如果您已根据上下文(网页的 HTML 或 Javascript 部分)正确转义了您的输出。
框架会自动清理输入,你不应该直接使用 $_POST、$_GET 或 $_SESSION 变量,而是通过路由、会话处理等机制。
对于数据库(模型)层,有像 Doctrine 这样的 ORM 框架或像 PDO 一样的包装器内特数据库。
您可以在这里阅读更多相关信息 - 什么是软件框架?
Easiest way to avoid mistakes in sanitizing input and escaping data is using PHP framework like Symfony, Nette etc. or part of that framework (templating engine, database layer, ORM).
Templating engine like Twig or Latte has output escaping on by default - you don't have to solve manually if you have properly escaped your output depending on context (HTML or Javascript part of web page).
Framework is automatically sanitizing input and you should't use $_POST, $_GET or $_SESSION variables directly, but through mechanism like routing, session handling etc.
And for database (model) layer there are ORM frameworks like Doctrine or wrappers around PDO like Nette Database.
You can read more about it here - What is a software framework?
只是想在输出转义的主题上添加一点,如果您使用 php DOMDocument 来制作 html 输出,它将在正确的上下文中自动转义。 属性 (value="") 和 的内部文本 不相等。
为了防范 XSS,请阅读以下内容:
OWASP XSS 预防备忘单
Just wanted to add that on the subject of output escaping, if you use php DOMDocument to make your html output it will automatically escape in the right context. An attribute (value="") and the inner text of a <span> are not equal.
To be safe against XSS read this:
OWASP XSS Prevention Cheat Sheet