删除所有 Blobstore 数据的最简单方法是什么?

发布于 2024-12-03 16:12:02 字数 408 浏览 1 评论 0原文

从 blobstore 中删除所有 blob 的最佳方法是什么?我正在使用Python。

我有很多斑点,我想将它们全部删除。我是 目前正在执行以下操作:

class deleteBlobs(webapp.RequestHandler): 
    def get(self): 
        all = blobstore.BlobInfo.all(); 
        more = (all.count()>0) 
        blobstore.delete(all); 
        if more: 
            taskqueue.add(url='/deleteBlobs',method='GET'); 

这似乎使用了大量的CPU并且(据我所知)正在做 没什么用处。

What is your best way to remove all of the blob from blobstore? I'm using Python.

I have quite a lot of blobs and I'd like to delete them all. I'm
currently doing the following:

class deleteBlobs(webapp.RequestHandler): 
    def get(self): 
        all = blobstore.BlobInfo.all(); 
        more = (all.count()>0) 
        blobstore.delete(all); 
        if more: 
            taskqueue.add(url='/deleteBlobs',method='GET'); 

Which seems to be using tons of CPU and (as far as I can tell) doing
nothing useful.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

倒带 2024-12-10 16:12:02

我使用这种方法:

import datetime
import logging
import re
import urllib

from google.appengine.ext import blobstore
from google.appengine.ext import db
from google.appengine.ext import webapp

from google.appengine.ext.webapp import blobstore_handlers
from google.appengine.ext.webapp import util
from google.appengine.ext.webapp import template

from google.appengine.api import taskqueue
from google.appengine.api import users


class IndexHandler(webapp.RequestHandler):
    def get(self):
        self.response.headers['Content-Type'] = 'text/plain'
        self.response.out.write('Hello. Blobstore is being purged.\n\n')
        try:
            query = blobstore.BlobInfo.all()

            index = 0

            to_delete = []
            blobs = query.fetch(400)
            if len(blobs) > 0:
                for blob in blobs:
                    blob.delete()
                    index += 1

            hour = datetime.datetime.now().time().hour
            minute = datetime.datetime.now().time().minute
            second = datetime.datetime.now().time().second
            self.response.out.write(str(index) + ' items deleted at ' + str(hour) + ':' + str(minute) + ':' + str(second))
            if index == 400:
                self.redirect("/purge")

        except Exception, e:
            self.response.out.write('Error is: ' + repr(e) + '\n')
            pass

APP = webapp.WSGIApplication(
    [
        ('/purge', IndexHandler),
    ],
    debug=True)

def main():
    util.run_wsgi_app(APP)


if __name__ == '__main__':
    main()

我的经验是,一次超过 400 个 blob 将失败,因此我让它每 400 个重新加载一次。我尝试了 blobstore.delete(query.fetch(400)),但我认为现在有一个错误。一切都没有发生,也没有删除任何内容。

I use this approach:

import datetime
import logging
import re
import urllib

from google.appengine.ext import blobstore
from google.appengine.ext import db
from google.appengine.ext import webapp

from google.appengine.ext.webapp import blobstore_handlers
from google.appengine.ext.webapp import util
from google.appengine.ext.webapp import template

from google.appengine.api import taskqueue
from google.appengine.api import users


class IndexHandler(webapp.RequestHandler):
    def get(self):
        self.response.headers['Content-Type'] = 'text/plain'
        self.response.out.write('Hello. Blobstore is being purged.\n\n')
        try:
            query = blobstore.BlobInfo.all()

            index = 0

            to_delete = []
            blobs = query.fetch(400)
            if len(blobs) > 0:
                for blob in blobs:
                    blob.delete()
                    index += 1

            hour = datetime.datetime.now().time().hour
            minute = datetime.datetime.now().time().minute
            second = datetime.datetime.now().time().second
            self.response.out.write(str(index) + ' items deleted at ' + str(hour) + ':' + str(minute) + ':' + str(second))
            if index == 400:
                self.redirect("/purge")

        except Exception, e:
            self.response.out.write('Error is: ' + repr(e) + '\n')
            pass

APP = webapp.WSGIApplication(
    [
        ('/purge', IndexHandler),
    ],
    debug=True)

def main():
    util.run_wsgi_app(APP)


if __name__ == '__main__':
    main()

My experience is that more than 400 blobs at once will fail, so I let it reload for every 400. I tried blobstore.delete(query.fetch(400)), but I think there's a bug right now. Nothing happened at all, and nothing was deleted.

遗弃M 2024-12-10 16:12:02

您将查询对象传递给删除方法,该方法将对其进行迭代以批量获取它,然后提交一个巨大的删除。这是低效的,因为它需要多次提取,并且如果您的结果多于可用时间或可用内存所能提取的结果,则该方法将不起作用。该任务要么完成一次,根本不需要链接,要么更有可能重复失败,因为它无法一次获取每个 blob。

此外,调用 count 执行查询只是为了确定计数,这很浪费时间,因为无论如何您都会尝试获取结果。

相反,您应该使用 fetch 批量获取结果,并删除每个批次。使用游标设置下一个批次,避免查询在找到第一个活动记录之前迭代所有“逻辑删除”记录,理想情况下,每个任务删除多个批次,使用计时器来确定何时应该停止和链接下一个任务。

You're passing the query object to the delete method, which will iterate over it fetching it in batches, then submit a single enormous delete. This is inefficient because it requires multiple fetches, and won't work if you have more results than you can fetch in the available time or with the available memory. The task will either complete once and not require chaining at all, or more likely, fail repeatedly, since it can't fetch every blob at once.

Also, calling count executes the query just to determine the count, which is a waste of time since you're going to try fetching the results anyway.

Instead, you should fetch results in batches using fetch, and delete each batch. Use cursors to set the next batch and avoid the need for the query to iterate over all the 'tombstoned' records before finding the first live one, and ideally, delete multiple batches per task, using a timer to determine when you should stop and chain the next task.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文