使用GAE中的任务队列插入批量数据

发布于 2024-09-29 21:30:52 字数 649 浏览 10 评论 0原文

我正在使用 Google App Engine 创建一个网络应用程序。该应用程序有一个实体,用户将通过上传工具插入该实体的记录。用户最多可以选择 5K 行(对象)数据。我正在使用 DataNucleus 项目作为 JDO 实现。这是我将数据插入数据存储所采用的方法。

  1. 数据从 CSV 中读取并转换为实体对象并存储在列表中。
  2. 该列表分为较小的对象组,例如每组大约 300 个。
  3. 每个组都使用 memcache 进行序列化并存储在缓存中,并使用唯一的 id 作为键。
  4. 对于每个组,都会创建一个任务并将其与密钥一起插入队列中。每个任务调用一个 servlet,该 servlet 将此键作为输入参数,从内存中读取数据并将其插入到数据存储中,并从内存中删除数据。

队列的最大速率为 2/分钟,桶大小为 1。我面临的问题是任务无法将所有 300 条记录插入到数据存储中。在 300 个数据中,插入的最大值约为 50 个。从内存缓存读取数据后,我已经验证了数据,并且能够从内存中取回所有存储的数据。我正在使用 PersistenceManager 的 makepersistent 方法将数据保存到 ds。有人可以告诉我这可能是什么问题吗?

另外,我想知道是否有更好的方法来处理批量插入/更新记录。我使用过BulkInsert工具。但在这样的情况下,它就不能满足要求。

I am using Google App Engine to create a web application. The app has an entity, records for which will be inserted through an upload facility by the user. User may select up to 5K rows(objects) of data. I am using DataNucleus project as JDO implementation. Here is the approach I am taking for inserting the data to Data Store.

  1. Data is read from the CSV and converted to entity objects and stored in a list.
  2. The list is divided into smaller groups of objects say around 300/group.
  3. Each group is serialized and stored in cache using memcache using a unique id as the key.
  4. For each group, a task is created and inserted into the Queue along with the key. Each task calls a servlet which takes this key as the input parameter, reads the data from memory and inserts this to the data store and deletes the data from memory.

The Queue has a maximum rate of 2/min and the bucket size is 1. The problem i am facing is the task is not able to insert all 300 records in to data store. Out of 300, maximum that gets inserted is around 50. I have validated the data once it is read from memcache and am able to get all the stored data back from the memory. I am using the makepersistent method of the PersistenceManager to save data to ds. Can someone please tell me what the issue could be?

Also, I want to know, is there a better way of handling bulk insert/update of records. I have used BulkInsert tool. But in cases like these, it will not satisfy the requirement.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

最终幸福 2024-10-06 21:30:52

这是 App Engine mapreduce 的完美用例。 Mapreduce 可以从 Blob 中读取文本行作为输入,它会为您分割输入并在任务队列上执行它。

当您说批量装载机“不会满足要求”时,如果您说出它不满足的要求,则会有所帮助 - 我认为在这种情况下,问题是您需要非管理员用户上传数据。

This is a perfect use-case for App Engine mapreduce. Mapreduce can read lines of text from a blob as input, and it will shard your input for you and execute it on the taskqueue.

When you say that the bulkloader "will not satisfy the requirement", it would help if you say what requirement you have that it doesn't satisfy, though - I presume in this case, the issue is that you need non-admin users to upload data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文