如何将数据批量上传到 Google App Engine 数据存储区?

发布于 2024-07-16 18:27:43 字数 98 浏览 10 评论 0 原文

我有大约 4000 条记录需要上传到数据存储区。

它们目前采用 CSV 格式。 如果有人愿意,我将不胜感激 向我指出或解释如何将数据批量上传到 GAE。

I have about 4000 records that I need to upload to Datastore.

They are currently in CSV format. I'd appreciate if someone would
point me to or explain how to upload data in bulk to GAE.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

青柠芒果 2024-07-23 18:27:43

您可以使用 bulkloader.py 工具:

bulkloader.py 工具包含在
Python SDK可以将数据上传到您的
应用程序的数据存储。 只需一个
只需一点点设置,您就可以创建
来自 CSV 文件的新数据存储实体。

You can use the bulkloader.py tool:

The bulkloader.py tool included with
the Python SDK can upload data to your
application's datastore. With just a
little bit of set-up, you can create
new datastore entities from CSV files.

七禾 2024-07-23 18:27:43

我没有完美的解决方案,但我建议您尝试使用 App Engine 控制台。 App Engine Console 是一个免费插件,可让您在生产环境中运行交互式 Python 解释器。 它对于一次性数据操作(例如初始数据导入)很有帮助,原因如下:

  1. 它是一种很好的旧式读取-评估-打印解释器。 您可以一次只做一件事情,而不必一次编写完美的导入代码并批量运行它。
  2. 您可以交互式访问自己的数据模型,因此可以从数据存储中读取/更新/删除对象。
  3. 您可以交互式访问 URL Fetch API,因此可以逐段提取数据。

我建议如下:

  1. 让您的数据模型在您的开发环境中运行
  2. 将您的 CSV 记录拆分为 1,000 以下的块。 将它们发布到 Amazon S3 或任何其他 URL 等位置。
  3. 在您的项目中安装 App Engine 控制台并将其推送到生产环境
  4. 登录到控制台。 (只有管理员可以使用控制台,因此您应该是安全的。您甚至可以将其配置为返回 HTTP 404,以“隐藏”未经授权的用户。)
  5. 对于 CSV 的每个块:
    1. 使用 URLFetch 获取一大块数据
    2. 使用内置的 csv 模块来分割数据,直到获得有用的数据结构列表(很可能是列表列表或类似内容)
    3. 编写一个 for 循环,迭代列表中的每个数据结构:
      1. 创建具有所有正确属性的数据对象
      2. 将其放入数据存储中

您应该发现,经过 #5 一次迭代后,您可以复制和粘贴,或者编写简单的函数来加速导入任务。 此外,通过在步骤 5.1 和 5.2 中获取和处理数据,您可以慢慢来,直到确定数据完美为止。

请注意,App Engine 控制台目前在 Firefox 上运行效果最佳。

I don't have the perfect solution, but I suggest you have a go with the App Engine Console. App Engine Console is a free plugin that lets you run an interactive Python interpreter in your production environment. It's helpful for one-off data manipulation (such as initial data imports) for several reasons:

  1. It's the good old read-eval-print interpreter. You can do things one at a time instead of having to write the perfect import code all at once and running it in batch.
  2. You have interactive access to your own data model, so you can read/update/delete objects from the data store.
  3. You have interactive access to the URL Fetch API, so you can pull data down piece by piece.

I suggest something like the following:

  1. Get your data model working in your development environment
  2. Split your CSV records into chunks of under 1,000. Publish them somewhere like Amazon S3 or any other URL.
  3. Install App Engine Console in your project and push it up to production
  4. Log in to the console. (Only admins can use the console so you should be safe. You can even configure it to return HTTP 404 to "cloak" from unauthorized users.)
  5. For each chunk of your CSV:
    1. Use URLFetch to pull down a chunk of data
    2. Use the built-in csv module to chop up your data until you have a list of useful data structures (most likely a list of lists or something like that)
    3. Write a for loop, iterating through each each data structure in the list:
      1. Create a data object with all correct properties
      2. put() it into the data store

You should find that after one iteration through #5, then you can either copy and paste, or else write simple functions to speed up your import task. Also, with fetching and processing your data in steps 5.1 and 5.2, you can take your time until you are sure that you have it perfect.

(Note, App Engine Console currently works best with Firefox.)

守护在此方 2024-07-23 18:27:43

通过使用远程 API 和对多个实体的操作。 我将使用 python 在 NDB 上展示一个示例,其中我们的 Test.csv 包含以下用分号分隔的值:

1;2;3;4
5;6;7;8

首先我们需要导入模块:

import csv
from TestData import TestData
from google.appengine.ext import ndb
from google.appengine.ext.remote_api import remote_api_stub

然后我们需要创建远程 api 存根:

remote_api_stub.ConfigureRemoteApi(None, '/_ah/remote_api', auth_func, 'your-app-id.appspot.com')

有关使用远程 api 的更多信息,请查看此答案

然后是主要代码,它基本上执行以下操作:

  1. 打开 Test.csv 文件。
  2. 设置分隔符。 我们正在使用分号。
  3. 然后,您有两个不同的选项来创建实体列表:
  4. 使用地图归约函数。
  5. 使用列表理解。
  6. 最后,您批量放置了整个实体列表。

主要代码:

# Open csv file for reading.
with open('Test.csv', 'rb') as file:
    # Set delimiter.
    reader = csv.reader(file, delimiter=';')

    # Reduce 2D list into 1D list and then map every element into entity.
    test_data_list = map(lambda number: TestData(number=int(number)),
            reduce(lambda list, row: list+row, reader)
        )

    # Or you can use list comprehension.
    test_data_list = [TestData(number=int(number)) for row in reader for number in row]

    # Batch put whole list into HRD.
    ndb.put_multi(test_data_list)

put_multi 操作还负责确保在单个 HTTP POST 请求中批处理适当数量的实体。

查看此文档以获取更多信息:

By using remote API and operations on multiple entities. I will show an example on NDB using python, where our Test.csv contains the following values separated with semicolon:

1;2;3;4
5;6;7;8

First we need to import modules:

import csv
from TestData import TestData
from google.appengine.ext import ndb
from google.appengine.ext.remote_api import remote_api_stub

Then we need to create remote api stub:

remote_api_stub.ConfigureRemoteApi(None, '/_ah/remote_api', auth_func, 'your-app-id.appspot.com')

For more information on using remote api have a look at this answer.

Then comes the main code, which basically does the following things:

  1. Opens the Test.csv file.
  2. Sets the delimiter. We are using semicolon.
  3. Then you have two different options to create a list of entities:
    1. Using map reduce functions.
    2. Using list comprehension.
  4. In the end you batch put the whole list of entities.

Main code:

# Open csv file for reading.
with open('Test.csv', 'rb') as file:
    # Set delimiter.
    reader = csv.reader(file, delimiter=';')

    # Reduce 2D list into 1D list and then map every element into entity.
    test_data_list = map(lambda number: TestData(number=int(number)),
            reduce(lambda list, row: list+row, reader)
        )

    # Or you can use list comprehension.
    test_data_list = [TestData(number=int(number)) for row in reader for number in row]

    # Batch put whole list into HRD.
    ndb.put_multi(test_data_list)

The put_multi operation also takes care of making sure to batch appropriate number of entities in a single HTTP POST request.

Have a look at this documentation for more information:

掐死时间 2024-07-23 18:27:43

App Engine sdk 的更高版本,可以使用 appcfg.py 上传,

请参阅 appcfg.py

the later version of app engine sdk, one can upload using the appcfg.py

see appcfg.py

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文