如何使用“组”在 pymongo 中对相似的行进行分组?

发布于 2024-10-17 20:25:54 字数 733 浏览 2 评论 0原文

我对 mongodb/pymongo 很陌生。我已成功将数据导入 mongo,并希望使用 group 函数将相似的行分组在一起。例如,如果我的数据集如下所示:

data = [{uid: 1 , event: 'a' , time: 1} , 
        {uid: 1 , event: 'b' , time: 2} ,
        {uid: 2 , event: 'c' , time: 2} ,
        {uid: 3 , event: 'd' , time: 4}
       ]

如何使用 group 函数根据 uid 字段对上述行进行分组,使得输出如下?

 { {uid: 1} : [{uid: 1 , event: 'a' , time: 1} , {uid: 1 , event: 'b' , time: 2} ],
   {uid: 2} : [{uid: 2 , event: 'c' , time: 2} ],
   {uid: 3} : [{uid: 3 , event: 'd' , time: 4} ] }

我通读了 http://www.mongodb.org/display/DOCS/Aggregation。然而,在我看来,这些例子总是聚合成一个数字或对象。

谢谢,

I am very new to mongodb/pymongo. I have successfully imported my data into mongo and would like to use the group function to group similar row together. For example, if my data set looks like this:

data = [{uid: 1 , event: 'a' , time: 1} , 
        {uid: 1 , event: 'b' , time: 2} ,
        {uid: 2 , event: 'c' , time: 2} ,
        {uid: 3 , event: 'd' , time: 4}
       ]

How do I use the group function to group the above rows according to the uid field such that the output is as follows?

 { {uid: 1} : [{uid: 1 , event: 'a' , time: 1} , {uid: 1 , event: 'b' , time: 2} ],
   {uid: 2} : [{uid: 2 , event: 'c' , time: 2} ],
   {uid: 3} : [{uid: 3 , event: 'd' , time: 4} ] }

I read through the examples at http://www.mongodb.org/display/DOCS/Aggregation. However, it seems to me that those example always aggregate into a single number or object.

Thanks,

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

╭⌒浅淡时光〆 2024-10-24 20:25:54

您不需要使用reduce函数来实际减少任何东西。例如:

>>> coll.insert(dict(uid=1,event='a',time=1))
ObjectId('4d5b91d558839f06a8000000')
>>> coll.insert(dict(uid=1,event='b',time=2))
ObjectId('4d5b91e558839f06a8000001')
>>> coll.insert(dict(uid=2,event='c',time=2))
ObjectId('4d5b91f358839f06a8000002')
>>> coll.insert(dict(uid=3,event='d',time=4))
ObjectId('4d5b91fd58839f06a8000003')
>>> result = coll.group(['uid'], None,
                        {'list': []}, # initial
                        'function(obj, prev) {prev.list.push(obj)}') # reducer
>>> len(result) # will show three groups
3
>>> int(result[0]['uid'])
1
>>> result[0]['list']
[{u'event': u'a', u'_id': ObjectId('4d5b...0000'), u'uid': 1, u'time': 1},
 {u'event': u'b', u'_id': ObjectId('4d5b...0001'), u'uid': 1, u'time': 2}]
>>> int(result[1]['uid'])
2
>>> result[1]['list']
[{u'event': u'c', u'_id': ObjectId('4d5b...0002'), u'uid': 2, u'time': 2}]
>>> int(result[2]['uid'])
3
>>> result[2]['list']
[{u'event': u'd', u'_id': ObjectId('4d5b...0003'), u'uid': 3, u'time': 4}]

我缩短了上面列表中的对象 ID 以提高可读性。

You needn't use the reduce function to actually reduce anything. For example:

>>> coll.insert(dict(uid=1,event='a',time=1))
ObjectId('4d5b91d558839f06a8000000')
>>> coll.insert(dict(uid=1,event='b',time=2))
ObjectId('4d5b91e558839f06a8000001')
>>> coll.insert(dict(uid=2,event='c',time=2))
ObjectId('4d5b91f358839f06a8000002')
>>> coll.insert(dict(uid=3,event='d',time=4))
ObjectId('4d5b91fd58839f06a8000003')
>>> result = coll.group(['uid'], None,
                        {'list': []}, # initial
                        'function(obj, prev) {prev.list.push(obj)}') # reducer
>>> len(result) # will show three groups
3
>>> int(result[0]['uid'])
1
>>> result[0]['list']
[{u'event': u'a', u'_id': ObjectId('4d5b...0000'), u'uid': 1, u'time': 1},
 {u'event': u'b', u'_id': ObjectId('4d5b...0001'), u'uid': 1, u'time': 2}]
>>> int(result[1]['uid'])
2
>>> result[1]['list']
[{u'event': u'c', u'_id': ObjectId('4d5b...0002'), u'uid': 2, u'time': 2}]
>>> int(result[2]['uid'])
3
>>> result[2]['list']
[{u'event': u'd', u'_id': ObjectId('4d5b...0003'), u'uid': 3, u'time': 4}]

I've shortened the object IDs in the above listing to improve readability.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文