输出或输入过滤？

发布于 2024-09-27 11:56:03 字数 821 浏览 12 评论 0原文

输出或输入过滤？

我经常看到人们写“过滤你的输入”，“清理你的输入”，不信任用户数据，但我只同意最后一个，我认为信任任何外部数据都是一个坏主意，即使它是相对于内部的系统。

输入过滤： 我看到的最常见的。采用发布数据或任何其他外部信息源的形式，并在保存时定义一些边界，例如确保文本是文本，数字是数字，sql 是有效的 sql，html 是有效的 html 并且不包含有害内容标记，然后将“安全”数据保存在数据库中。

但是，在获取数据时，您只需使用数据库中的原始数据。

在我个人看来，数据从来都不是真正安全的。虽然听起来很简单，只需过滤从表单和 url 获得的所有内容，但实际上比这困难得多，这对于一种语言可能是安全的，但对于另一种语言则不然。

输出过滤： 当这样做时，我将原始的未更改的数据（无论它是什么）与准备好的语句一起保存到数据库中，然后在访问数据时过滤掉有问题的代码，这有它自己的优点： 这在 html 和服务器端脚本之间添加了一层。 我认为这是某种数据访问分离。

现在，数据根据上下文进行过滤，例如，我可以将数据库中的数据以纯转义文本、html 或任何地方的形式呈现在 html 文档中。

这里的缺点是您绝对不能忘记添加过滤，这比输入过滤要困难一些，并且在提供数据时会使用更多的 CPU。

这并不意味着您不需要进行验证检查，您仍然需要进行验证检查，只是您不保存过滤后的数据，而是对其进行验证并在数据因某种原因无效时向用户提供错误消息。

因此，与其“过滤你的输入”，也许应该“验证你的输入，过滤你的输出”。

那么我应该选择“输入验证和过滤”还是“输入验证和输出过滤”？

原文

Output or Input filtering?

I constantly see people writing "filter you inputs", "sanitize your inputs", don't trust user data, but I only agree with the last one, where I consider trusting any external data a bad idea even if it is internal relative to the system.

Input filtering:
The most common that I see.
Take the form post data or any other external source of information and define some boundaries when saving it, for example making sure text is text, numbers are numbers, that sql is valid sql, that html is valid html and that it does not contain harmful markup, and then you save the "safe" data in the database.

But when fetching data you just use the raw data from the database.

In my personal opinion, the data is never really safe.
Although it sounds easy, just filter everything you get from forms and url's, in reality it is much harder than that, it might be safe for one language but not another.

Output filtering:
When doing it this way I save the raw unaltered data, whatever it might be, with prepared statements into the database and then filter out the problematic code when accessing the data, this has it's own advantages:
This adds a layer between html and the server side script.
which I consider to be data access separation of sorts.

Now data is filtered depending on the context, for example I can have the data from the database presented in a html document as plain-escaped-text, or as html or as anything anywhere.

The drawbacks here are that you must not ever forget to add the filtering which is a little bit harder than with input filtering and it uses a bit more CPU when providing data.

This does not mean that you don't need to do validation checks, you still do, it's just that you don't save the filtered data, you validate it and provide the user with a error message if the data is somehow invalid.

So instead of going with "filter your inputs" maybe it should be "validate your inputs, filter your outputs".

so should I go with "Input validation and filtering" or "Input validation and output filtering"?

分享到QQ

分享到微博