过滤用户输入 - 需要澄清
我想澄清一下使用 php 过滤用户输入的正确方法是什么。例如,我有一个用户输入信息的网络表单。提交后,表单中的数据将被输入到数据库中。
我的理解是,您不想清理进入数据库的数据,除了转义数据(例如 mysql_escape_string),您希望在使用 htmlentities 或 htmlspecialchars 之类的内容在前端显示数据时对其进行清理。但是,如果您愿意,您可以在用户提交表单时验证/过滤用户输入,以确保数据采用正确的格式,例如,如果某个字段用于电子邮件地址,您希望验证它是否具有正确的电子邮件格式。这是正确的吗?
我的下一个问题是,当您在网络表单中重新显示数据时,您会如何处理这些数据?假设用户在填写表单并将信息添加到数据库后可以编辑该表单中的信息。然后他们返回并查看他们最初输入的字段中的数据,您是否必须清理数据才能使其在表单字段中正确显示?例如,有一个名为“我的头衔”的字段,该人输入我的头衔是“经理”。您会看到 manager 周围的引号,当您将其按原样显示到表单字段中时,它会因为引号而中断:
<input type="text" name="title" value="My title is "Manager"">
那么您是否必须执行类似 htmlentities 之类的操作才能将引号转换为其 html 实体?否则,该字段的值将类似于“我的标题是
希望”,这是有道理的。
I would like to clarify what is the proper way to filter user input with php. For example I have a web form that a user enters information into. When submitted the data from the form will be entered into a database.
My understanding is you don't want to sanitize the data going into the database, except for escaping it such as mysql_escape_string, you want to sanitize it when displaying it on the front end with something like htmlentities or htmlspecialchars. However if you want you can validate/filter the user input when they submit the form to make sure the data is in the proper format such as if a field is for an email address you want to validate that it has the proper email format. Is that correct?
My next question is what do you do with the data when you re-display it in a web form? Lets say the user is allowed to edit the information in that form after they filled it out and the information was added to the database. They then go back in and see the data in the fields they originally entered, do you have to sanitize the data for it to show correctly in the form fields? For example there is a field called My Title, the person enters My title is "Manager". You see the quotations around manager, when you display it as is into the form field it breaks because of the quotations:
<input type="text" name="title" value="My title is "Manager"">
So don't you have to do something like htmlentities to turn the quotations into its html entities? Otherwise the value of the field would look like My title is
Hope this makes sense.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
没有什么表明您不能在数据库插入之前清理数据。毕竟,如果您的脚本/站点/公司对表单字段中可接受的内容有一定的政策,那么最好在保存之前删除任何不允许的内容。这样,您只需在数据插入/更新之前清理一次,而不是每次检索数据时都清理一次。
如果您允许 HTML 实体(例如)重音字符,但不允许 HTML 标记,那么您还必须检查无效实体 (
&foobar;
?) 和 HTML 标记。既然你不允许它们,就不要费心存储它们。如果您需要有效的电子邮件地址,请检查它是否符合 RFC 5322,并且仅在用户输入正确的数据后才存储它。 (该电子邮件地址是否确实存在是另一回事)。现在,让我们弄清楚一件事。消毒和逃避之间是有区别的。清理的字面意思是清理——从数据中删除任何不需要的东西。您可以默默地删除它,或者向用户显示错误并告诉他们修复它。另一方面,转义只是对数据进行编码以便正确显示的一种方法。
对于您的
My title is "Manager"
字符串,您无需对其进行清理,因为它没有任何真正的错误或冒犯性。您需要做的是至少使用htmlspecialchars()
对其进行转义,以便嵌入的双引号不会“破坏”您的表单。如果您逐字嵌入它,大多数浏览器会将其视为具有value="My title is"
和一些虚假属性/垃圾Manager""
。因此,您通过 htmlspecialchars 运行它,最终得到My title is "Manager"
,它完美地嵌入到value=""
中,没有任何问题。没有清理,只是正确的编码。现在,当提交该表单时,您必须再次清理/验证,因为数据已掌握在潜在恶意用户的手中,并且数据可能已更改为
My title ispwn me
。基本上,工作流程应该是:
,然后
Nothing says you can't sanitize data before database insertion. After all, if your script/site/company has a certain policy regarding what's acceptable in a form field, it's best to strip out anything that's not allowed before saving it. That way you only sanitize once, before data insertion/update, rather than EVERY TIME you retrieve the data.
If you allow HTML entities for (say) accented characters, but not HTML tags, then you have to both check for invalid entities (
&foobar;
?) and HTML tags as well. Since you don't allow them, don't bother storing them. If you require a valid email address, then check if it's at RFC 5322 compliant and only store it once the user's entered proper data. (Whether that email address actually exists is another matter).Now, let's get one thing straight. There's a difference between sanitization and escaping. Sanitization means literally to clean up - you're removing anything you don't want from the data. You can either silently drop it, or present an error to the user and tell them to fix it. On the other hand, escaping is just a means of encoding data so it's displayed properly.
With your
My title is "Manager"
string, you don't need to sanitize it, as there's nothing really wrong or offensive about it. What you do need to do is escape it, with at leasthtmlspecialchars()
, so that the embedded double quotes don't "break" your form. If you embed it verbatim, most browsers will see it as havingvalue="My title is"
and some bogus attribute/garbageManager""
. So, you run it through htmlspecialchars and end upMy title is "Manager"
, which embeds into thevalue=""
perfectly with no trouble. No sanitization, just proper encoding.Now, when that form is submitted, then you do have to sanitize/validate again, as the data's been in the hands of a potentially malicious user, and the data could have been changed to
My title is <script>document.location='http://attacksite.com';</script>pwn me
.Basically, the workflow should be:
then later