MongoDB groupby多次map两次之前reduce
我有一个 events=[ timestamp, accountId, deviceId, rfid ...]
-rfids 可为空的集合,但其他所有内容都不可为空 - rfid 通过 deviceId 报告
我需要找到系统中每个 rfid 的状态。乍一看,如果我们映射到 {accountId,deviceId,rfid},这似乎微不足道,但是,rfids 状态也依赖于报告 deviceIds 事件。当设备报告时,它会将 rfid 值设置为空(例如设备可能会重新启动)。我将如何基于 {accountId,deviceId,rfid} 定义单个映射函数,然后将映射集合与所有 {accountId,deviceId, null} 映射集合联合起来?
现在我使用 linq 如下来获取我想要的数据集:
events.GroupBy(new{deviceId, accountId}).Select( x=>new{
Key= x.Key
Value = x.GroupBy(y=>new{y.accountId, y.rfid}).Union(x.Where(z=>z.rfid== null))).ToList()
});
I have a collection of events=[ timestamp, accountId, deviceId, rfid ...]
-rfids is nullable, but everything else is not nullable
-the rfid reports through a deviceId
I need to find the state of every rfid in my system. At first glance this seems trivial if we map on {accountId,deviceId,rfid} however, the rfids state is also dependent on the reporting deviceIds events. When the device reports, it sets the rfid value to null (for example the device may power cycle). How would I go about defining a single mapping function based on the {accountId,deviceId,rfid} and then unioning the map collection with all the {accountId,deviceId, null} mapped collection?
right now I use linq as follows to get my desired dataset:
events.GroupBy(new{deviceId, accountId}).Select( x=>new{
Key= x.Key
Value = x.GroupBy(y=>new{y.accountId, y.rfid}).Union(x.Where(z=>z.rfid== null))).ToList()
});
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
必须对数据集进行两次映射/归约操作
1)map {acctid,deviceId,rfid} =>减少到数组[事件]
2) (1) {acctId,deviceId} => 的映射结果根据 statusCode 减少为锁存 rfid 数组
这里要记住的关键一点是,发出函数值参数(第二个参数)应与结果集具有相同的结构。这是因为reduce是迭代执行的!!这是生成初始事件数组时的一个痛点。
Two map/reduce passes have to be made on the dataset
1)map {acctid,deviceId,rfid} => reduce to array[events]
2)map results of (1) {acctId,deviceId} => reduce to array of latched rfids based on statusCode
A key thing to remember here is that the emit functions value parameter (2nd parameter ) should have the same structure as the result set. This is because reduce is performed iteratively!! this was a pain point when generating the inital events array.
嗯,不确定这是否是“见森林不见树”的问题。如果您按
{deviceId, accountId}
进行分组,则该组中已包含 null 和非 null rfids。如果我理解正确,则 {deviceId, accountId} 有一个唯一的 rfid,如果是这样,只需从组中提取第一个非空 rfid 及其所有元素作为值:另一方面,如果您的设备、帐户组合可以如果有多个 rfid,那么您就没有健全的解决方案,因为无效的 rfid 可能属于任何帐户、设备、rfid 组合。
注意:要使其工作,每个组合中必须至少有一个非空 rfid,否则
First()
将崩溃并烧毁。再说一遍,如果您在combe中没有非空rfid,则无法首先知道它是什么,一个选择是使用FirstOrDefault,但随后您将获得多个空密钥,每个帐户、设备一个没有 rfid 的组合。Well, not sure if this is the "can't see the tree from the forest" kind of problem. If you group by
{deviceId, accountId}
you already have both null and non-null rfids in the group. If I understood you right a {deviceId, accountId} has a unique rfid to it, if so, just extract the first non-null rfid from the group and all its elements as values:If on the other hand your device, account combo can have multiple rfids then you dont have a sound solution as a nulled rfid could belong to any account,device,rfid combo.
Note: for this to work you must have at least one non-null rfid in each combo, otherwise the
First()
will crash and burn. Then again, if you have no non-null rfid in combe there is no way to know what it is in the first place, one option is to use FirstOrDefault, but then you'll get multiple null keys, one for each account,device combo without rfid.