Google App Engine内存中存储300MB的解决方案

发布于 2024-11-18 02:37:58 字数 1535 浏览 5 评论 0原文

我正在 Python 中使用 Google App Engine。我的数据库中有 5000 人。包含 5000 个人物对象的整个列表占用 300 MB 内存。

我一直在尝试使用 blobcache 将其存储在内存中,这是一个[此处][1]编写的模块。

我遇到了pickle“OutOfMemory”问题,并且正在寻找一种解决方案,该解决方案涉及将这5000个对象存储到数据库中,然后一次检索它们。

我的人物模型看起来像这样。

class PersonDB(db.Model):
    serialized = db.BlobProperty()
    pid = db.StringProperty()

每个人都是一个具有许多与之关联的属性和方法的对象,因此我决定对每个人对象进行pickle并将其存储为序列化字段。 pid 只允许我通过他们的 id 查询该人。我的人看起来像这样

class Person():
    def __init__(self, sex, mrn, age):
       self.sex = sex;
       self.age = age; #exact age
       self.record_number = mrn;
       self.locations = [];

    def makeAgeGroup(self, ageStr):
       ageG = ageStr
       return int(ageG)

    def addLocation(self, healthdistrict):
        self.locations.append(healthdistrict) 

当我一次将所有 5000 人存储到我的数据库中时,我收到服务器 500 错误。有谁知道为什么?我的代码如下:

   #People is my list of 5000 people objects
def write_people(self, people):
    for person in people:
        personDB = PersonDB()
        personDB.serialized = pickle.dumps(person)
        personDB.pid = person.record_number
        personDB.put()

如何在我的 App Engine 方法中一次检索所有 5000 个这些对象?

我的想法是做这样的事情

def get_patients(self):
    #Get my list of 5000 people back from the database
    people_from_db = db.GqlQuery("SELECT * FROM PersonDB")
    people = []
    for person in people_from_db:
        people.append(pickle.loads(person.serialized))

提前感谢您的帮助,我已经坚持了一段时间了!

I am using Google App Engine in Python. I have 5000 people in my database. The entire list of 5000 people objects takes up 300 MB of memory.

I have been trying to store this in memory using blobcache, a module written [here][1].

I am running into pickle "OutOfMemory" issues, and am looking for a solution that involves storing these 5000 objects into a database, and then retrieving them all at once.

My person model looks like this.

class PersonDB(db.Model):
    serialized = db.BlobProperty()
    pid = db.StringProperty()

Each person is an object that has many attributes and methods associated with it, so I decided to pickle each person object and store it as the serialized field. The pid just allows me to query the person by their id. My person looks something like this

class Person():
    def __init__(self, sex, mrn, age):
       self.sex = sex;
       self.age = age; #exact age
       self.record_number = mrn;
       self.locations = [];

    def makeAgeGroup(self, ageStr):
       ageG = ageStr
       return int(ageG)

    def addLocation(self, healthdistrict):
        self.locations.append(healthdistrict) 

When I store all 5000 people at once into my database, I get a Server 500 error. Does anyone know why? My code for this is as follows:

   #People is my list of 5000 people objects
def write_people(self, people):
    for person in people:
        personDB = PersonDB()
        personDB.serialized = pickle.dumps(person)
        personDB.pid = person.record_number
        personDB.put()

How would I retrieve all 5000 of these objects at once in my App Engine method?

My idea is to do something like this

def get_patients(self):
    #Get my list of 5000 people back from the database
    people_from_db = db.GqlQuery("SELECT * FROM PersonDB")
    people = []
    for person in people_from_db:
        people.append(pickle.loads(person.serialized))

Thanks for the help in advance, I've been stuck on this for a while!!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

忆沫 2024-11-25 02:37:58

您不应该同时将所有 5000 个用户存储在内存中。只检索您需要的那个。

You should not have all 5000 users in memory at once. Only retrieve the one you need.

渔村楼浪 2024-11-25 02:37:58

对于这种大小的数据,为什么不使用 blobstore 和 memcache?

就性能而言(从最高到最低):

  • 本地实例内存(您的数据集太大)
  • memcache(将数据划分为多个键,您应该没问题,而且速度非常快!)
  • blobstore + memcache(坚持到 blobstore 而不是比 DB)
  • db + memcache(持久化到 db)

查看今年的 Google IO 视频,有一个很棒的视频是关于使用 blobstore 来完成此类事情的。对于某些用例,数据库会带来显着的性能(和成本)损失。

(对于迂腐的读者来说,后三个的读取性能实际上是相同的,但写入时间/成本有显着差异)

For this size of data why not use a blobstore and memcache?

In terms of performance (from highest to lowest):

  • local instance memory (your data set is too large)
  • memcache (partition your data into several keys and you should be fine, and it's very fast!)
  • blobstore + memcache (persist to blobstore rather than DB)
  • db + memcache (persist to db)

Check out the Google IO videos from this year, there is a great one on using the blobstore for exactly this sort of thing. There is a significant performance (and cost) penalty associated with the DB for some use cases.

(for the pedantic readers, the read performance of the last three will be effectively the same, but there are significant differences in write time/cost)

水水月牙 2024-11-25 02:37:58

您还可以检查项目绩效应用引擎
https://github.com/ocanbascil/PerformanceEngine

You would also check a project performance appengine
https://github.com/ocanbascil/PerformanceEngine

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文