何时过滤/清理数据:在数据库插入之前还是在显示之前?
当我准备解决输入数据过滤和清理问题时,我很好奇是否有最佳(或最常用)的做法? 在将数据插入数据库之前过滤/清理数据(HTML、JavaScript 等)是否更好,还是应该在准备以 HTML 形式显示数据时进行?
一些注意事项:
- 我正在 PHP 中执行此操作,但我怀疑这个问题的答案与语言无关。 但如果您有任何针对 PHP 的建议,请分享!
- 这不是转义数据库插入数据的问题。 我已经用 PDO 很好地处理了这个问题。
谢谢!
As I prepare to tackle the issue of input data filtering and sanitization, I'm curious whether there's a best (or most used) practice? Is it better to filter/sanitize the data (of HTML, JavaScript, etc.) before inserting the data into the database, or should it be done when the data is being prepared for display in HTML?
A few notes:
- I'm doing this in PHP, but I suspect the answer to this is language agnostic. But if you have any recommendations specific to PHP, please share!
- This is not an issue of escaping the data for database insertion. I already have PDO handling that quite well.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
当谈到显示用户提交的数据时,普遍接受的口头禅是“过滤输入,转义输出”。
我建议不要在进入数据库之前转义 html 实体等内容,因为你永远不知道什么时候 HTML 不会成为你的显示介质。 此外,不同类型的情况需要不同类型的输出转义。 例如,在 Javascript 中嵌入字符串需要与 HTML 中不同的转义。 在此之前这样做可能会让自己陷入一种错误的安全感。
因此,基本的经验法则是,在使用前和专门针对该用途进行消毒; 不先发制人。
(请注意,我不是在谈论转义 SQL 的输出,只是为了显示。请仍然对 SQL 字符串绑定的数据进行转义)。
When it comes to displaying user submitted data, the generally accepted mantra is to "Filter input, escape output."
I would recommend against escaping things like html entities, etc, before going into the database, because you never know when HTML will not be your display medium. Also, different types of situations require different types of output escaping. For example, embedding a string in Javascript requires different escaping than in HTML. Doing this before may lull yourself into a false sense of security.
So, the basic rule of thumb is, sanitize before use and specifically for that use; not pre-emptively.
(Please note, I am not talking about escaping output for SQL, just for display. Please still do escape data bound for an SQL string).
我喜欢以原始形式保存/存储数据。
我仅根据使用数据的位置转义/过滤数据。
i like to have/store the data in original form.
i only escape/filter the data depending on the location where i'm using it.
您至少应该关心两种类型的过滤/清理:
显然,在将数据插入数据库之前/时必须注意第一种,以防止 SQL 注入。
但正如你所说,你已经知道了,所以我不再谈论它。
另一方面,第二个问题是一个更有趣的问题:
所以:
htmlspecialchars
或等效项,这可能不是那么消耗 CPU...所以这可能并不重要顺便说一句,如果用户在输入数据时使用 bbcode/markdown/wiki 之类的东西,并且您在 HTML 中呈现它,那么第一个解决方案也很好......
至少,只要它的显示次数多于更新次数 - 特别是如果您不使用任何缓存来存储干净的 HTML 版本。
There are at least two types of filtering/sanitization you should care about :
Obviously, the first one has to be taken care of before/when inserting the data to the database, to prevent SQL Injections.
But you already know that, as you said, so I won't talk about it more.
The second one, on the other hand, is a more interesting question :
So :
htmlspecialchars
or an equivalent, which is probably not that much of a CPU-eater... So it probably doesn't matter muchBTW, the first solution is also nice if users are using something like bbcode/markdown/wiki when inputting the data, and you are rendering it in HTML...
At least, as long as it's displayed more often than it's updated -- and especially if you don't use any cache to store the clean HTML version.
如有必要(即,如果您没有使用为您处理该问题的数据库交互层),请在将其放入数据库之前对其进行清理。 在展示前对其进行消毒以供展示。
以当前不必要的引用形式存储内容只会导致太多问题。
Sanitize it for the database before putting it in the database, if necessary (i.e. if you're not using a database interactivity layer that handles that for you). Sanitize it for display before display.
Storing things in a presently unnecessary quoted form just causes too many problems.
我总是在将东西传递到需要逃脱的地方之前立即说逃脱。 您的数据库不关心 HTML,因此在存储到数据库之前无需转义 HTML。 如果您想要输出为 HTML 以外的内容,或者更改允许/禁止的标签,您可能需要做一些工作。 此外,与在过程的早期阶段相比,在需要完成转义时更容易记住进行转义。
还值得注意的是,HTML 转义字符串可能比原始输入长得多。 如果我在注册表中输入日语用户名,原始字符串可能只有 4 个 Unicode 字符,但 HTML 转义可能会将其转换为长字符串“〹𐤲䡈&” #31337;”。 那么我的 4 个字符的用户名对于您的数据库字段来说太长,并且被存储为两个日语字符加半个转义码,这也可能会阻止我登录。
请注意,浏览器往往会转义一些内容,例如非英语文本自己提交表单,总会有那个聪明人到处使用日本用户名。 因此,您可能希望在存储之前实际unescape HTML。
I always say escape things immediately before passing them to the place they need to be escaped. Your database doesn't care about HTML, so escaping HTML before storing in the database is unnecessary. If you ever want to output as something other than HTML, or change which tags are allowed/disallowed, you might have a bit of work ahead of you. Also, it's easier to remember to do the escaping right when it needs to be done, than at some much earlier stage in the process.
It's also worth noting that HTML-escaped strings can be much longer than the original input. If I put a Japanese username in a registration form, the original string might only be 4 Unicode characters, but HTML escaping may convert it to a long string of "〹𐤲䡈穩". Then my 4-character username is too long for your database field, and gets stored as two Japanese characters plus half an escape code, which also probably prevents me from logging in.
Beware that browsers tend to escape some things like non-English text in submitted forms themselves, and there will always be that smartass who uses a Japanese username everywhere. So you may want to actually unescape HTML before storing.
主要取决于您计划如何处理输入以及您的开发环境。
在大多数情况下,您需要原始输入。 这样您就可以根据自己的喜好调整输出,而不必担心丢失原始内容。 这还允许您解决输出损坏等问题。 您始终可以看到您的过滤器有问题或客户的输入是错误的。
另一方面,一些短语义数据可以立即被过滤。 1)您不希望数据库中出现混乱的电话号码,因此对于此类事情最好进行清理。 2)您不希望其他程序员在没有转义的情况下意外输出数据,并且您在多程序员环境中工作。 然而,在大多数情况下,原始数据在我看来更好。
Mostly it depends on what you are planning to do with the input, as well as your development environment.
In most cases you want original input. This way you get the power to tweak your output to your heart's content without fear of losing the original. This also allows you to troubleshoot issues such as broken output. You can always see how your filters are buggy or customer's input is erroneous.
On the other hand some short semantic data could be filtered immediately. 1) You don't want messy phone numbers in database, so for such things it could be good to sanitize. 2) You don't want some other programmer to accidentally output data without escaping, and you work in multiprogrammer environment. However, for most cases raw data is better IMO.