Django：阻止外部查询字符串请求

发布于 2024-12-27 22:06:14 字数 468 浏览 1 评论 0原文

我从事网络开发已经几个月了，但一直遇到这个棘手的问题。页面通常使用查询字符串来请求内容，该查询字符串通常包含有意义的数据，例如数据库中的 id。一个例子是一个链接，例如：

http://www.example.com/posts?id=5

我一直在尝试想出一个好的策略来防止用户在没有从链接访问它的情况下手动输入 id 值——我只想确认通过我的网站上显示的链接发出的请求。此外，该网站可能没有身份验证系统并允许匿名浏览；话虽如此，这些信息并不是特别敏感，但我仍然不喜欢无法控制对某些信息的访问的想法。我想，一种选择是对此类页面使用 HTTP POST 请求——我不相信用户可以模拟 POST 请求，但我可能是错的。

此外，用户可以为 id 输入任意数字，最终请求数据库中不存在的记录。当然，我可以验证请求的 ID，但这样我就会浪费资源来适应此检查。

有什么想法吗？我正在使用 django，但任何编程语言的通用策略都会很好。谢谢。

原文

I've been doing web development for a few months now and keep having this nagging problem. It is typical for pages to request content with a query string which usually contains meaningful data such as an id in the database. An example would be a link such as:

http://www.example.com/posts?id=5

I've been trying to think of a good strategy to prevent users from manually entering a value for the id without having accessed it from a link--I'd only wish to acknowledge requests that were made by links presented on my website. Also, the website may not have an authentication system and allows for anonymous browsing; that being said, the information isn't particularly sensitive but still I don't like the idea of not being able to control access to certain information. One option, I suppose, would be to use HTTP POST requests for these kind of pages -- I don't believe a user can simulate a post request but I may be wrong.

Furthermore, the user could place any arbitrary number for the id and end up requesting a record that doesn't exist in the database. Of course, I could validate the requested id but then I would be wasting resources to accommodate this check.

Any thoughts? I'm working with django but a general strategy for any programming language would be good. Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

狼性发作 2025-01-03 22:06:14

首先，在 GET 和 POST 之间进行选择：用户可以模拟任何类型的请求，因此 POST 对此没有帮助。在两者之间进行选择时，最好根据用户正在采取的操作或他们与您的内容交互的方式来决定。他们是否收到页面或向您发送数据（表单就是明显的例子）？对于检索某种帖子的情况，GET 是合适的。

另外值得注意的是，如果内容适合添加书签，则 GET 是正确的选择。仅仅基于引荐来源网址提供 URL——正如您所说，“防止用户在没有从链接访问它的情况下手动输入 id 的值”——是一个糟糕的主意。这会给你带来无数的麻烦，并且对于用户来说可能不是一个好的体验。

作为一般原则，避免依赖数据库记录的主键。该键（在您的情况下 id=5）应纯粹视为自动增量字段，以防止记录冲突，即保证表中的所有记录始终具有唯一的字段。该 ID 字段是一个后端实用程序。不要将其暴露给您的用户，也不要自己依赖它。

如果你不能使用ID，你用什么？常见的习惯用法是使用记录的日期、slug或两者。如果您正在处理帖子，请使用发布/创建日期。然后添加一个文本字段，其中包含 URL 友好和描述性单词。称其为 slug，并阅读 Django 的 models.SlugField更多信息。此外，还可以查看基本上任何新闻网站上的文章的 URL。您的最终网址将类似于 http://www.example.com/posts/2012/01/19/this-is-cool/

现在您的网址看起来很友好，Google- fu SEO 的好处是可以添加书签并且不可猜测。因为您不依赖于后端数据库固定任意 ID，所以您可以自由地...恢复备份数据库转储、移动数据库、将自动递增数字 ID 更改为 UUID 哈希，等等。只有您的数据库会关心，而不是作为程序员的您和您的用户。

哦，不要过度担心用户“请求不存在的记录”或“验证请求的 ID”......无论如何你都必须这样做。它不会消耗不必要的资源。这就是数据库支持的网站的工作原理。您必须将请求连接到数据。如果请求不正确，您将返回 404。您的网络服务器会对不存在的 URL 执行此操作，并且您需要对不存在的数据执行此操作。查看 Django 的 get_object_or_404()想法/实施。

First, choosing between GET and POST: A user can simulate any kind of request, so POST will not help you there. When choosing between the two it is best to decide based on the action the user is taking or how they are interacting with your content. Are they getting a page or sending you data (a form is the obvious example)? For your case of retrieving some sort of post, GET is appropriate.

Also worth noting, GET is the correct choice if the content is appropriate for bookmarking. Serving a URL based solely on the referrer -- as you say, "prevent users from manually entering a value for the id without having accessed it from a link" -- is a terrible idea. This will cause you innumerable headaches and it is probably not a nice experience for the user.

As general principle, avoid relying on the primary key of a database record. That key (id=5 in your case) should be treated purely as an auto-increment field to prevent record collisions, i.e. you are guaranteed to always have a unique field for all records in the table. That ID field is a backend utility. Don't expose it to your users and don't rely on it yourself.

If you can't use ID, what do you use? A common idiom is using the date of the record, a slug or both. If you are dealing with posts, use the published/created date. Then add a text field that will hold URL friendly and descriptive words. Call it a slug and read about Django's models.SlugField for more information. Also, see the URL of an article on basically any news site. Your final URL will look something like http://www.example.com/posts/2012/01/19/this-is-cool/

Now your URL is friendly on the eyes, has Google-fu SEO benefits, is bookmark-able and isn't guessable. Because you aren't relying on a back-end database fixed arbitrary ID, you have the freedom to...restore a backup db dump, move databases, change the auto-increment number ID to a UUID hash, whatever. Only your database will care, not you as a programmer and not your users.

Oh and don't over-worry about a user "requesting a record that doesn't exist" or "validating the requested id"...you have to do that anyway. It isn't consuming unnecessary resources. It is how a database-backed website works. You have to connect the request to the data. If the request is incorrect, you 404. Your webserver does it for non-existent URLs and you'll need to do it for non-existent data. Checkout Django's get_object_or_404() for ideas/implementation.

回复收藏 0 原文

像极了他 2025-01-03 22:06:14

我知道有两种方法可以有效地做到这一点，因为基本上没有办法阻止某人伪造任何请求。

第一个是不要在查询参数中使用裸 ID。相反，生成一个大的随机数，并从中建立链接。您必须在数据库中保留一个表，将随机数映射到它们所代表的实际 ID，并且最终必须清理该表。这实现起来相当简单，但需要一些存储空间，并且有时需要对存储的数据进行一些管理。

第二种方法是在建立链接时对数据进行签名。通过将加密签名附加到数据中，并在发出请求时验证签名，您可以确保只有您的 Web 服务可能创建了该链接。即使请求本身是“伪造的”（可能被添加书签、写下来、复制并粘贴到另一个浏览器中），您也知道您的站点已经授权了该 URL。

为此，您需要使用您正在签名的数据（例如，只是“id”值，或者可能是 id 和您签署数据的时间）和密钥来创建消息身份验证代码 (MAC)您只保留在您的服务器上。

然后，在您看来，您获取 id 值（或 id 和时间戳，如果您正在使用的话）并再次构造 MAC，并查看它们是否匹配。如果有任何差异，您将因请求被篡改而拒绝该请求。

查看 hmac 模块的 python 文档，以及 hashlib 模块了解所有详细信息。

您可以在 python 中生成如下链接：

settings.py:

hmac_secret_key = '12345'

views.py:

import time, hmac, hashlib
from django.conf import settings

def some_view(request):
    ...
    id = 5
    time = int(time.time())
    mac = hmac.new(
        settings.hmac_secret_key,
        '%d/%d' % (id, time),
        hashlib.sha1)
    url = 'http://www.example.com/posts/id=%d&ts=%d&mac=%s' % (
        id, time, mac.hexdigest())
    # Now return a template with that url in it somewhere

要在另一个视图中验证它，您将使用如下代码：（警告，警告，不健壮，仍有大量错误检查要做）

def posts_view(request):
    id = int(request.GET['id'])
    ts = int(request.GET['ts'])
    mac_from_url = request.GET['mac']

    computed_mac = hmac.new(
        settings.hmac_secret_key,
        '%d/%d' % (id, time),
        hashlib.sha1)

    if mac_from_url <> computed_mac:
        raise SomeSecurityException()

    # Now you know that the request is legit. 
    # You can check the timestamp here, too, if you like.

There are two ways I know of to do this effectively, since there is basically no way to stop someone from forging any request.

The first is not to use bare IDs in the query parameters. Instead, generate a large random number, and make the link out of that. You will have to keep a table in your database mapping your random numbers to the actual IDs they represent, and you will have to clean the table eventually. This is fairly simple to implement, but requires some storage space, and some management of the stored data occasionally.

The second method is to sign the data when you make a link. By appending a cryptographic signature to the data, and verifying the signature when a request is made, you ensure that only your web service could possibly have created the link. Even if the request itself is 'forged' -- perhaps bookmarked, written down, copy-and-pasted into another browser -- you know that your site has already authorized that URL.

To do this, you need to create a Message Authentication Code (MAC) with the data that you are signing (say, just the 'id' value, or possibly the id and the time that you signed the data) and with a secret key that you keep only on your server.

In your view, then, you take the id value (or id and timestamp, if that's what you're using) and you construct the MAC again, and see if they match. If there's any difference, you reject the request as having been tampered with.

Look at the python docs for the hmac module, as well as the hashlib module for all of the details.

You could generate a link in python like this:

settings.py:

hmac_secret_key = '12345'

views.py:

import time, hmac, hashlib
from django.conf import settings

def some_view(request):
    ...
    id = 5
    time = int(time.time())
    mac = hmac.new(
        settings.hmac_secret_key,
        '%d/%d' % (id, time),
        hashlib.sha1)
    url = 'http://www.example.com/posts/id=%d&ts=%d&mac=%s' % (
        id, time, mac.hexdigest())
    # Now return a template with that url in it somewhere

To verify it in another view, you would use code like this: (warning, warning, not robust, lots of error checking still to do)

def posts_view(request):
    id = int(request.GET['id'])
    ts = int(request.GET['ts'])
    mac_from_url = request.GET['mac']

    computed_mac = hmac.new(
        settings.hmac_secret_key,
        '%d/%d' % (id, time),
        hashlib.sha1)

    if mac_from_url <> computed_mac:
        raise SomeSecurityException()

    # Now you know that the request is legit. 
    # You can check the timestamp here, too, if you like.

回复收藏 0 原文