备份 appengine 数据存储区的推荐策略

发布于 2024-12-15 05:22:35 字数 202 浏览 3 评论 0原文

现在,我每天晚上使用remote_api 和 appcfg.py download_data 拍摄数据库快照。需要很长时间(6小时)并且价格昂贵。如果不滚动我自己的基于更改的备份(我太害怕做类似的事情),确保我的数据不会出现故障的最佳选择是什么?

PS:我认识到谷歌的数据可能比我的安全得多。但如果有一天我不小心编写了一个程序将其全部删除怎么办?

Right now I use remote_api and appcfg.py download_data to take a snapshot of my database every night. It takes a long time (6 hours) and is expensive. Without rolling my own change-based backup (I'd be too scared to do something like that), what's the best option for making sure my data is safe from failure?

PS: I recognize that Google's data is probably way safer than mine. But what if one day I accidentally write a program that deletes it all?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

一身仙ぐ女味 2024-12-22 05:22:35

我想你已经基本确定了所有的选择。

  1. 相信谷歌不会丢失您的数据,并希望您不会意外地指示他们销毁它。
  2. 使用 download_data 执行完整备份,如果成本过高,频率可能低于每晚一次。
  3. 推出您自己的增量备份解决方案。

选项3实际上是一个有趣的想法。您需要所有实体上的修改时间戳,并且您不会捕获已删除的实体,但否则使用remote_api和游标是非常可行的。

编辑

这是一个与remote_api一起使用的简单增量下载器。再次强调,它不会注意到已删除的实体,并且假设所有实体都将上次修改时间存储在名为 Updated_at 的属性中。使用它需要您自担风险。

import os
import hashlib
import gzip
from google.appengine.api import app_identity
from google.appengine.ext.db.metadata import Kind
from google.appengine.api.datastore import Query
from google.appengine.datastore.datastore_query import Cursor

INDEX = 'updated_at'
BATCH = 50
DEPTH = 3

path = ['backups', app_identity.get_application_id()]
for kind in Kind.all():
  kind = kind.kind_name
  if kind.startswith('__'):
    continue
  while True:
    print 'Fetching %d %s entities' % (BATCH, kind)
    path.extend([kind, 'cursor.txt'])
    try:
      cursor = open(os.path.join(*path)).read()
      cursor = Cursor.from_websafe_string(cursor)
    except IOError:
      cursor = None
    path.pop()
    query = Query(kind, cursor=cursor)
    query.Order(INDEX)
    entities = query.Get(BATCH)
    for entity in entities:
      hash = hashlib.sha1(str(entity.key())).hexdigest()
      for i in range(DEPTH):
        path.append(hash[i])
      try:
        os.makedirs(os.path.join(*path))
      except OSError:
        pass
      path.append('%s.xml.gz' % entity.key())
      print 'Writing', os.path.join(*path)
      file = gzip.open(os.path.join(*path), 'wb')
      file.write(entity.ToXml())
      file.close()
      path = path[:-1-DEPTH]
    if entities:
      path.append('cursor.txt')
      file = open(os.path.join(*path), 'w')
      file.write(query.GetCursor().to_websafe_string())
      file.close()
      path.pop()
    path.pop()
    if len(entities) < BATCH:
      break

I think you've pretty much identified all of your choices.

  1. Trust Google not to lose your data, and hope you don't accidentally instruct them to destroy it.
  2. Perform full backups with download_data, perhaps less frequently than once per night if it is prohibitively expensive.
  3. Roll your own incremental backup solution.

Option 3 is actually an interesting idea. You'd need a modification timestamp on all entities, and you wouldn't catch deleted entities, but otherwise it's very doable with remote_api and cursors.

Edit:

Here's a simple incremental downloader for use with remote_api. Again, the caveats are that it won't notice deleted entities, and it assumes all entities store the last modification time in a property named updated_at. Use it at your own peril.

import os
import hashlib
import gzip
from google.appengine.api import app_identity
from google.appengine.ext.db.metadata import Kind
from google.appengine.api.datastore import Query
from google.appengine.datastore.datastore_query import Cursor

INDEX = 'updated_at'
BATCH = 50
DEPTH = 3

path = ['backups', app_identity.get_application_id()]
for kind in Kind.all():
  kind = kind.kind_name
  if kind.startswith('__'):
    continue
  while True:
    print 'Fetching %d %s entities' % (BATCH, kind)
    path.extend([kind, 'cursor.txt'])
    try:
      cursor = open(os.path.join(*path)).read()
      cursor = Cursor.from_websafe_string(cursor)
    except IOError:
      cursor = None
    path.pop()
    query = Query(kind, cursor=cursor)
    query.Order(INDEX)
    entities = query.Get(BATCH)
    for entity in entities:
      hash = hashlib.sha1(str(entity.key())).hexdigest()
      for i in range(DEPTH):
        path.append(hash[i])
      try:
        os.makedirs(os.path.join(*path))
      except OSError:
        pass
      path.append('%s.xml.gz' % entity.key())
      print 'Writing', os.path.join(*path)
      file = gzip.open(os.path.join(*path), 'wb')
      file.write(entity.ToXml())
      file.close()
      path = path[:-1-DEPTH]
    if entities:
      path.append('cursor.txt')
      file = open(os.path.join(*path), 'w')
      file.write(query.GetCursor().to_websafe_string())
      file.close()
      path.pop()
    path.pop()
    if len(entities) < BATCH:
      break
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文