google appengine mapper - 映射日期范围

发布于 2024-12-07 18:26:59 字数 726 浏览 0 评论 0原文

我想使用 appengine 映射器来迭代一系列日期（起始日期和截止日期作为属性传递给配置）。对于范围内的每个日期，我将检索将该日期作为属性的实体并对此集进行操作。

例如，如果我有以下一组实体：

Key  Date           Value
a    2011/09/09     323
b    2011/09/09     132
c    2011/09/08     354
d    2011/09/08     432
e    2011/09/08     234
f    2011/09/07     423
g    2011/09/07     543

我想指定日期范围 2011/09/09 - 2011/09/07，这将创建三个映射器实例，分别为 2011/09/09、2011/09/ 08 和 2011/09/07。反过来，它们将分别查询实体 a+b、c+d+e 和 f+g，并对值执行一些操作。（每个映射器还会对其他数据存储查询额外的数据，因此下面的“额外问题”）

大概我需要创建一个自定义 InputFormat 类，但是我对 mapreduce/hadoop 还很陌生我希望有人能举出一些例子？

额外问题：使用 dao 在映射器中加载数据是“不好的形式”吗？我使用过的其他分布式计算平台（例如 DataSynapse）需要您将所有输入打包并提供任务以防止数据服务器上出现过多争用。但是，对于 appengine HR 数据存储，我认为这不是一个问题？

原文

I would like to use the appengine mapper to iterate over a range of dates (from-date and to-date passed as properties to the configuration). For each date in the range, I would retrieve the entities that have this date as a property and operate on this set.

For example, if I have the following set of entities:

Key  Date           Value
a    2011/09/09     323
b    2011/09/09     132
c    2011/09/08     354
d    2011/09/08     432
e    2011/09/08     234
f    2011/09/07     423
g    2011/09/07     543

I would like to specify a date range of 2011/09/09 - 2011/09/07 which would create three mapper instances, for 2011/09/09, 2011/09/08 and 2011/09/07. In turn these would query for entities a+b, c+d+e and f+g respectively, and perform some operations on the values. (Each of the mappers would also make other datastore queries for additional data, hence the 'bonus question' below)

Presumably I need to create a custom InputFormat class, however I'm quite new to mapreduce/hadoop and I was hoping someone had some examples?

Bonus question: is it "bad form" to use a dao to load data in a mapper? Other distributed computing platforms I have worked with (eg DataSynapse) would require that you parcel all inputs up and provide with the task to prevent too much contention on a dataserver. However, with the appengine HR datastore I presume this isn't a concern?

分享到QQ

分享到微博