何时过滤/清理数据:在数据库插入之前还是在显示之前?

发布于 2024-08-02 00:54:54 字数 276 浏览 7 评论 0原文

当我准备解决输入数据过滤和清理问题时,我很好奇是否有最佳(或最常用)的做法? 在将数据插入数据库之前过滤/清理数据(HTML、JavaScript 等)是否更好,还是应该在准备以 HTML 形式显示数据时进行?

一些注意事项:

  • 我正在 PHP 中执行此操作,但我怀疑这个问题的答案与语言无关。 但如果您有任何针对 PHP 的建议,请分享!
  • 这不是转义数据库插入数据的问题。 我已经用 PDO 很好地处理了这个问题。

谢谢!

As I prepare to tackle the issue of input data filtering and sanitization, I'm curious whether there's a best (or most used) practice? Is it better to filter/sanitize the data (of HTML, JavaScript, etc.) before inserting the data into the database, or should it be done when the data is being prepared for display in HTML?

A few notes:

  • I'm doing this in PHP, but I suspect the answer to this is language agnostic. But if you have any recommendations specific to PHP, please share!
  • This is not an issue of escaping the data for database insertion. I already have PDO handling that quite well.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

无所的.畏惧 2024-08-09 00:54:54

当谈到显示用户提交的数据时,普遍接受的口头禅是“过滤输入,转义输出”。

我建议不要在进入数据库之前转义 html 实体等内容,因为你永远不知道什么时候 HTML 不会成为你的显示介质。 此外,不同类型的情况需要不同类型的输出转义。 例如,在 Javascript 中嵌入字符串需要与 HTML 中不同的转义。 在此之前这样做可能会让自己陷入一种错误的安全感。

因此,基本的经验法则是,在使用前和专门针对该用途进行消毒; 不先发制人。

(请注意,我不是在谈论转义 SQL 的输出,只是为了显示。请仍然对 SQL 字符串绑定的数据进行转义)。

When it comes to displaying user submitted data, the generally accepted mantra is to "Filter input, escape output."

I would recommend against escaping things like html entities, etc, before going into the database, because you never know when HTML will not be your display medium. Also, different types of situations require different types of output escaping. For example, embedding a string in Javascript requires different escaping than in HTML. Doing this before may lull yourself into a false sense of security.

So, the basic rule of thumb is, sanitize before use and specifically for that use; not pre-emptively.

(Please note, I am not talking about escaping output for SQL, just for display. Please still do escape data bound for an SQL string).

久而酒知 2024-08-09 00:54:54

我喜欢以原始形式保存/存储数据。
我仅根据使用数据的位置转义/过滤数据。

  • 在网页上 - 对 sql 上的所有 html 进行编码
  • 引号
  • - 去掉url 上的
  • 进行转义命令编码- 针对该作业进行编码
  • - 在打印机上进行 urlencoding -对任何内容

i like to have/store the data in original form.
i only escape/filter the data depending on the location where i'm using it.

  • on a webpage - encode all html
  • on sql - kill quotes
  • on url - urlencoding
  • on printers - encode escape commands
  • on what ever - encode it for that job
慵挽 2024-08-09 00:54:54

您至少应该关心两种类型的过滤/清理:

  • SQL
  • HTML

显然,在将数据插入数据库之前/时必须注意第一种,以防止 SQL 注入。

但正如你所说,你已经知道了,所以我不再谈论它。

另一方面,第二个问题是一个更有趣的问题:

  • 如果您的用户必须能够编辑他们的数据,那么以他们最初输入数据的方式将其返回给他们是很有趣的; 这意味着您必须存储“non-html-specialchars-escaped”版本。
  • 如果你想显示一些 HTML,你可能会使用类似 HTMLPurifier 的东西:非常强大......但可能需要如果您在必须显示的每个数据上运行它,则资源有点太多...

所以:

  • 如果您想显示一些 HTML,使用重型工具来验证/过滤它,我会说您需要存储已过滤/任何版本到数据库中,为了不破坏服务器,每次显示数据时重新创建它
    • 但您还需要存储“原始”版本(参见我之前所说的)
    • 在这种情况下,我可能会将两个版本都存储到数据库中,即使它占用更多空间......或者至少使用一些良好的缓存机制,而不是一遍又一遍地重新创建干净的版本。
  • 如果您不想显示任何 HTML,您将使用 htmlspecialchars 或等效项,这可能不是那么消耗 CPU...所以这可能并不重要
    • 您仍然需要存储“原始”版本
    • 但在输出数据时转义可能没问题。

顺便说一句,如果用户在输入数据时使用 bbcode/markdown/wiki 之类的东西,并且您在 HTML 中呈现它,那么第一个解决方案也很好......

至少,只要它的显示次数多于更新次数 - 特别是如果您不使用任何缓存来存储干净的 HTML 版本。

There are at least two types of filtering/sanitization you should care about :

  • SQL
  • HTML

Obviously, the first one has to be taken care of before/when inserting the data to the database, to prevent SQL Injections.

But you already know that, as you said, so I won't talk about it more.

The second one, on the other hand, is a more interesting question :

  • if your users must be able to edit their data, it is interesting to return it to them the same way they entered it at first ; which means you have to store a "non-html-specialchars-escaped" version.
  • if you want to have some HTML displayed, you'll maybe use something like HTMLPurifier : very powerful... But might require a bit too much resources if you are running it on every data when it has to be displayed...

So :

  • If you want to display some HTML, using a heavy tool to validate/filter it, I'd say you need to store an already filtered/whatever version into the database, to not destroy the server, re-creating it each time the data is displayed
    • but you also need to store the "original" version (see what I said before)
    • In that case, I'd probably store both versions into database, even if it takes more place... Or at least use some good caching mecanism, to not-recreate the clean version over and over again.
  • If you don't want to display any HTML, you will use htmlspecialchars or an equivalent, which is probably not that much of a CPU-eater... So it probably doesn't matter much
    • you still need to store the "original" version
    • but escaping when you are outputing the data might be OK.

BTW, the first solution is also nice if users are using something like bbcode/markdown/wiki when inputting the data, and you are rendering it in HTML...

At least, as long as it's displayed more often than it's updated -- and especially if you don't use any cache to store the clean HTML version.

§普罗旺斯的薰衣草 2024-08-09 00:54:54

如有必要(即,如果您没有使用为您处理该问题的数据库交互层),请在将其放入数据库之前对其进行清理。 在展示前对其进行消毒以供展示。

以当前不必要的引用形式存储内容只会导致太多问题。

Sanitize it for the database before putting it in the database, if necessary (i.e. if you're not using a database interactivity layer that handles that for you). Sanitize it for display before display.

Storing things in a presently unnecessary quoted form just causes too many problems.

囍笑 2024-08-09 00:54:54

我总是在将东西传递到需要逃脱的地方之前立即说逃脱。 您的数据库不关心 HTML,因此在存储到数据库之前无需转义 HTML。 如果您想要输出为 HTML 以外的内容,或者更改允许/禁止的标签,您可能需要做一些工作。 此外,与在过程的早期阶段相比,在需要完成转义时更容易记住进行转义。

还值得注意的是,HTML 转义字符串可能比原始输入长得多。 如果我在注册表中输入日语用户名,原始字符串可能只有 4 个 Unicode 字符,但 HTML 转义可能会将其转换为长字符串“〹𐤲䡈&” #31337;”。 那么我的 4 个字符的用户名对于您的数据库字段来说太长,并且被存储为两个日语字符加半个转义码,这也可能会阻止我登录。

请注意,浏览器往往会转义一些内容,例如非英语文本自己提交表单,总会有那个聪明人到处使用日本用户名。 因此,您可能希望在存储之前实际unescape HTML。

I always say escape things immediately before passing them to the place they need to be escaped. Your database doesn't care about HTML, so escaping HTML before storing in the database is unnecessary. If you ever want to output as something other than HTML, or change which tags are allowed/disallowed, you might have a bit of work ahead of you. Also, it's easier to remember to do the escaping right when it needs to be done, than at some much earlier stage in the process.

It's also worth noting that HTML-escaped strings can be much longer than the original input. If I put a Japanese username in a registration form, the original string might only be 4 Unicode characters, but HTML escaping may convert it to a long string of "〹𐤲䡈穩". Then my 4-character username is too long for your database field, and gets stored as two Japanese characters plus half an escape code, which also probably prevents me from logging in.

Beware that browsers tend to escape some things like non-English text in submitted forms themselves, and there will always be that smartass who uses a Japanese username everywhere. So you may want to actually unescape HTML before storing.

月竹挽风 2024-08-09 00:54:54

主要取决于您计划如何处理输入以及您的开发环境。

在大多数情况下,您需要原始输入。 这样您就可以根据自己的喜好调整输出,而不必担心丢失原始内容。 这还允许您解决输出损坏等问题。 您始终可以看到您的过滤器有问题或客户的输入是错误的。

另一方面,一些短语义数据可以立即被过滤。 1)您不希望数据库中出现混乱的电话号码,因此对于此类事情最好进行清理。 2)您不希望其他程序员在没有转义的情况下意外输出数据,并且您在多程序员环境中工作。 然而,在大多数情况下,原始数据在我看来更好。

Mostly it depends on what you are planning to do with the input, as well as your development environment.

In most cases you want original input. This way you get the power to tweak your output to your heart's content without fear of losing the original. This also allows you to troubleshoot issues such as broken output. You can always see how your filters are buggy or customer's input is erroneous.

On the other hand some short semantic data could be filtered immediately. 1) You don't want messy phone numbers in database, so for such things it could be good to sanitize. 2) You don't want some other programmer to accidentally output data without escaping, and you work in multiprogrammer environment. However, for most cases raw data is better IMO.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文