appengine 任务负载可以有多大?

发布于 2024-08-15 19:53:57 字数 416 浏览 7 评论 0原文

我正在使用 java appengine 的新实验任务队列,并且正在尝试创建在数据存储中聚合统计信息的任务。我正在尝试计算数据存储中所有实体(某种类型)中唯一值的数量。更具体地说,假设类型 X 的实体有一个字段 A。我想计算数据存储中 A 的唯一值的数量。

我当前的方法是创建一个查询 X 类型的前 10 个实体的任务,创建一个哈希表来存储 A 的唯一值,然后将该哈希表作为有效负载传递给下一个任务。下一个任务将计算接下来的 10 个实体,依此类推,直到我遍历完所有实体。在执行最后一个任务期间,我将计算哈希表中的键数(一直从一个任务传递到另一个任务),以查找 A 的唯一值的总数。

这适用于我的数据存储。但我担心一旦我有很多唯一值,这个哈希表就会变得太大。 appengine 任务的有效负载允许的最大大小是多少???

您能建议任何替代方法吗?

谢谢。

I'm using the new experimental taskqueue for java appengine and I'm trying to create tasks that aggregate statistics in my datastore. I'm trying to count the number of UNIQUE values within all the entitities (of a certain type) in my datastore. More concretely, say entity of type X has a field A. I want to count the NUMBER of unique values of A in my datastore.

My current approach is to create a task which queries for the first 10 entities of type X, creating a hashtable to store the unique values of A in, then passing this hashtable to the next task as the payload. This next task will count the next 10 entities and so on and so forth until I've gone through all the entities. During the execution of the last task, I'll count the number of keys in my hashtable (that's been passed from task to task all along) to find the total number of unique values of A.

This works for a small number of entities in my data store. But I'm worried that this hashtable will get too big once I have a lot of unique values. What is the maximum allowable size for the payload of an appengine task?????

Can you suggest any alternative approaches?

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

酒与心事 2024-08-22 19:53:57

“你能建议任何替代方法吗?”。

通过基于值构造键并使用 Model.get_or_insert 来为每个唯一值创建一个实体。然后,使用正常的分页技巧,Query.count 以 1000 个为一批(或者在请求超时之前可以计算出的数量 - 超过 10 个)来增加实体。

或者使用类似于 get_or_insert 文档中给出的代码来随时保持计数 - App Engine 事务可以运行多次,因此事务中递增的 memcached 计数将不可靠。不过,这可能有一些技巧,或者您可以将计数保留在数据存储中,前提是您没有对实体父级做任何太不愉快的事情。

"Can you suggest any alternative approaches?".

Create an entity for each unique value, by constructing a key based on the value and using Model.get_or_insert. Then Query.count up the entities in batches of 1000 (or however many you can count before your request times out - more than 10), using the normal paging tricks.

Or use code similar to that given in the docs for get_or_insert to keep count as you go - App Engine transactions can be run more than once, so a memcached count incremented in the transaction would be unreliable. There may be some trick around that, though, or you could keep the count in the datastore provided that you aren't doing anything too unpleasant with entity parents.

秉烛思 2024-08-22 19:53:57

这可能为时已晚,但也许有用。首先,只要您有机会想要连续浏览一组实体,建议使用已索引的 date_created 或 date_modified auto_update 字段。从此时起,您可以使用 TextProperty 创建一个模型,以使用 json.dumps() 存储哈希表。您需要做的就是传递最后处理的日期以及哈希表实体的模型 ID。使用 date_created 晚于最后一个日期、json_load() TextProperty 进行查询,并累积接下来的 10 条记录。可以变得更复杂一些(例如,通过利用传递的参数和稍微不同的查询方法来处理 date_created 冲突)。为下一个任务添加 1 秒倒计时,以避免因过快更新哈希表实体而出现任何问题。 HTH,-史蒂夫

This may be too late, but perhaps it can be of use. First, anytime you have a remote chance of wanting to walk serially through a set of entities, suggest using either a date_created or date_modified auto_update field which is indexed. From this point you can create a model with a TextProperty to store your hash table using json.dumps(). All you need to do is pass the last date processed, and the model id for the hash table entity. Do a query with date_created later than the last date, json_load() the TextProperty, and accumulate the next 10 records. Could get a bit more sophisticated (e.g. handle date_created collisions by utilizing the parameters passed and a little different query approach). Add a 1 second countdown to the next task to avoid any issues with updating the hash table entity too quickly. HTH, -stevep

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文