如何在调用 db.put() 之前在 GAE Python 中查找 db.Model 实例的大小?

发布于 2024-10-19 13:59:02 字数 476 浏览 3 评论 0原文

我正在为我的应用程序编写一个优化器,因此尽可能少地调用 db.put() 。我遇到了以下问题:

我有许多从 db.Model 派生的类。这些类的实例存储在列表中:

class DBPutter:
    data = [] # list of instances
    def add(self, model):
        # HERE I WANT TO CHECK THAT self.data IS NOT EXEEDING 1MB
        self.data.append(model)
        if len(self.data) == 1000:
            self.flush()  # actual call to db.put() using deferred

通过这种方法,我收到了很多 RequestTooLargeError 异常。如何检查我的数据是否未超过 1MB?

I'm writing an optimizer for my application, so db.put() invoked as rarely as possible. I stuck with following problem:

I have a number of classes derived from db.Model. The instances of those classes stored in list:

class DBPutter:
    data = [] # list of instances
    def add(self, model):
        # HERE I WANT TO CHECK THAT self.data IS NOT EXEEDING 1MB
        self.data.append(model)
        if len(self.data) == 1000:
            self.flush()  # actual call to db.put() using deferred

With this approach I receive alot of RequestTooLargeError exceptions. How do I check that my data is not exeeding 1MB?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

白首有我共你 2024-10-26 13:59:02

Pympler 有一个 asizeof 方法,应该在 python 2.5 中运行: http://code.google.com/p /pympler/

但我认为你过度优化了。如果在推杆中放入 1000 个对象之前关闭实例,您可能会丢失数据。另外,我认为使用带有大量数据的延迟库将导致至少两个 db.put。一个是在任务提交时(因为有效负载超过 10k),另一个是在任务内部,实际编写模型。

Pympler has a asizeof method, and should run in python 2.5: http://code.google.com/p/pympler/

I think you're over-optimizing though. If an instance is shut down before 1000 objects are in your putter you could lose data. Also, I think using the deferred library with a large amount of data would result in at least two db.puts. One when the task is submitted (because the payload is over 10k), and one inside the task, actually writing your models.

岁月静好 2024-10-26 13:59:02

根据 1.4.0 发行说明:

  • 数据存储批量获取/放置/删除操作的大小和数量限制已被删除。单个实体仍限制为 1 MB,但您的应用程序可以在总体数据存储截止日期允许的情况下将尽可能多的实体一起批处理以进行 get/put/delete 调用。

也就是说,为此使用 deferred 是没有意义的:任务队列有效负载限制为 10k,如果您的延迟有效负载大于此值,它将创建一个数据存储实体来存储有效负载。因此,它无论如何都会执行数据存储操作,所以你也可以自己做。

但是,如果您要存储数千个实体,那么您几乎肯定希望首先在任务队列上执行整个过程,而不是在交互式请求中。

As per the 1.4.0 release notes:

  • Size and quantity limits on datastore batch get/put/delete operations have been removed. Individual entities are still limited to 1 MB, but your app may batch as many entities together for get/put/delete calls as the overall datastore deadline will allow for.

That said, using deferred for this is pointless: Task Queue payloads are limited to 10k, and if your deferred payload is bigger than that, it will create a datastore entity to store the payload in. As a result, it's doing a datastore operation anyway, so you may as well do it yourself.

If you're storing thousands of entities, though, you almost certainly want to be doing the whole process on the task queue in the first place, rather than in an interactive request.

无风消散 2024-10-26 13:59:02

我不使用 GAE,但您可以尝试在每个模型上调用 sys.getsizeof 并验证总和是否小于 1 MB。

编辑:请参阅此 ActiveState Recipe 以获取 sys.getsizeof 的替代方案,它应该在 Python 2.5 中工作。

I don't work with GAE, but you could try to call sys.getsizeof on each of your models and verify that the sum is less than 1 MB.

Edit: See this ActiveState recipe for an alternative to sys.getsizeof, which should work in Python 2.5.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文