如何使用 LINQ 对数据进行分层分组?
我有一些具有各种属性的数据,我想对这些数据进行分层分组。例如:
public class Data
{
public string A { get; set; }
public string B { get; set; }
public string C { get; set; }
}
我希望将其分组为:
A1
- B1
- C1
- C2
- C3
- ...
- B2
- ...
A2
- B1
- ...
...
目前,我已经能够使用 LINQ 对其进行分组,使得顶部组将数据除以 A,然后每个子组除以 B,然后每个 B 子组包含除 C 的子组,等等。 LINQ 如下所示(假设有一个名为 data
的 IEnumerable
序列):
var hierarchicalGrouping =
from x in data
group x by x.A
into byA
let subgroupB = from x in byA
group x by x.B
into byB
let subgroupC = from x in byB
group x by x.C
select new
{
B = byB.Key,
SubgroupC = subgroupC
}
select new
{
A = byA.Key,
SubgroupB = subgroupB
};
正如您所看到的,需要的子分组越多,这就会变得有些混乱。有没有更好的方法来执行这种类型的分组?似乎应该有,但我只是没有看到。
更新
到目前为止,我发现使用流畅的 LINQ API 而不是查询语言来表达这种层次分组可以说提高了可读性,但感觉不太干燥。
我有两种方法来做到这一点:一种使用 GroupBy
和结果选择器,另一种使用 GroupBy
后跟 Select
调用。两者都可以被格式化为比使用查询语言更具可读性,但仍然不能很好地扩展。
var withResultSelector =
data.GroupBy(a => a.A, (aKey, aData) =>
new
{
A = aKey,
SubgroupB = aData.GroupBy(b => b.B, (bKey, bData) =>
new
{
B = bKey,
SubgroupC = bData.GroupBy(c => c.C, (cKey, cData) =>
new
{
C = cKey,
SubgroupD = cData.GroupBy(d => d.D)
})
})
});
var withSelectCall =
data.GroupBy(a => a.A)
.Select(aG =>
new
{
A = aG.Key,
SubgroupB = aG
.GroupBy(b => b.B)
.Select(bG =>
new
{
B = bG.Key,
SubgroupC = bG
.GroupBy(c => c.C)
.Select(cG =>
new
{
C = cG.Key,
SubgroupD = cG.GroupBy(d => d.D)
})
})
});
我想要什么...
我可以设想几种表达方式(假设语言和框架支持它)。第一个是 GroupBy
扩展,它采用一系列函数对进行键选择和结果选择,Func
和 Func
。每对描述下一个子组。此选项会失败,因为每一对都可能要求 TKey
和 TResult
与其他配对不同,这意味着 GroupBy
将需要有限的参数,并且一个复杂的声明。
第二个选项是 SubGroupBy
扩展方法,可以链接起来生成子组。 SubGroupBy
与 GroupBy
相同,但结果将是先前的分组进一步分区。例如:
var groupings = data
.GroupBy(x=>x.A)
.SubGroupBy(y=>y.B)
.SubGroupBy(z=>z.C)
// This version has a custom result type that would be the grouping data.
// The element data at each stage would be the custom data at this point
// as the original data would be lost when projected to the results type.
var groupingsWithCustomResultType = data
.GroupBy(a=>a.A, x=>new { ... })
.SubGroupBy(b=>b.B, y=>new { ... })
.SubGroupBy(c=>c.C, c=>new { ... })
这样做的困难在于如何有效地实现这些方法,按照我目前的理解,每个级别都会重新创建新的对象以扩展以前的对象。第一次迭代将创建 A 的分组,第二次迭代将创建具有 A 的键和 B 的分组的对象,第三次迭代将重做所有这些并添加 C 的分组。这似乎效率非常低(尽管我怀疑我当前的选择)无论如何,实际上都会这样做)。如果调用传递所需内容的元描述并且仅在最后一次创建实例,那就太好了,但这听起来也很困难。请注意,他的操作与使用 GroupBy 执行的操作类似,但没有嵌套方法调用。
希望这一切都是有意义的。我希望我在这里追逐彩虹,但也许不是。
更新 - 另一种选择
我认为比我之前的建议更优雅的另一种可能性依赖于每个父组只是一个键和一系列子项(如示例中所示),就像 IGrouping
现在提供的那样。这意味着构建此分组的一个选项是一系列键选择器和一个结果选择器。
如果键都限制为一个集合类型(这并非不合理),那么可以将其生成为键选择器和结果选择器的序列,或者结果选择器和键选择器的params
。当然,如果键必须具有不同类型和不同级别,那么除了由于泛型参数化的工作方式而导致层次结构的有限深度之外,这再次变得困难。
以下是我的意思的一些说明性示例:
例如:
public static /*<grouping type>*/ SubgroupBy(
IEnumerable<Func<TElement, TKey>> keySelectors,
this IEnumerable<TElement> sequence,
Func<TElement, TResult> resultSelector)
{
...
}
var hierarchy = data.SubgroupBy(
new [] {
x => x.A,
y => y.B,
z => z.C },
a => new { /*custom projection here for leaf items*/ })
或者:
public static /*<grouping type>*/ SubgroupBy(
this IEnumerable<TElement> sequence,
Func<TElement, TResult> resultSelector,
params Func<TElement, TKey>[] keySelectors)
{
...
}
var hierarchy = data.SubgroupBy(
a => new { /*custom projection here for leaf items*/ },
x => x.A,
y => y.B,
z => z.C)
这不能解决实现效率低下的问题,但它应该解决复杂的嵌套问题。但是,该分组的返回类型是什么?我需要自己的界面还是可以以某种方式使用IGrouping
。我需要定义多少,或者层次结构的可变深度仍然使这变得不可能?
我的猜测是,这应该与任何 IGrouping 调用的返回类型相同,但是如果类型系统不涉及传递的任何参数,则类型系统如何推断该类型?
这个问题拓展了我的理解,这很好,但我的大脑很痛。
I have some data that has various attributes and I want to hierarchically group that data. For example:
public class Data
{
public string A { get; set; }
public string B { get; set; }
public string C { get; set; }
}
I would want this grouped as:
A1
- B1
- C1
- C2
- C3
- ...
- B2
- ...
A2
- B1
- ...
...
Currently, I have been able to group this using LINQ such that the top group divides the data by A, then each subgroup divides by B, then each B subgroup contains subgroups by C, etc. The LINQ looks like this (assuming an IEnumerable<Data>
sequence called data
):
var hierarchicalGrouping =
from x in data
group x by x.A
into byA
let subgroupB = from x in byA
group x by x.B
into byB
let subgroupC = from x in byB
group x by x.C
select new
{
B = byB.Key,
SubgroupC = subgroupC
}
select new
{
A = byA.Key,
SubgroupB = subgroupB
};
As you can see, this gets somewhat messy the more subgrouping that's required. Is there a nicer way to perform this type of grouping? It seems like there should be and I'm just not seeing it.
Update
So far, I have found that expressing this hierarchical grouping by using the fluent LINQ APIs rather than query language arguably improves readability, but it doesn't feel very DRY.
There were two ways I did this: one using GroupBy
with a result selector, the other using GroupBy
followed by a Select
call. Both could be formatted to be more readable than using query language but don't still don't scale well.
var withResultSelector =
data.GroupBy(a => a.A, (aKey, aData) =>
new
{
A = aKey,
SubgroupB = aData.GroupBy(b => b.B, (bKey, bData) =>
new
{
B = bKey,
SubgroupC = bData.GroupBy(c => c.C, (cKey, cData) =>
new
{
C = cKey,
SubgroupD = cData.GroupBy(d => d.D)
})
})
});
var withSelectCall =
data.GroupBy(a => a.A)
.Select(aG =>
new
{
A = aG.Key,
SubgroupB = aG
.GroupBy(b => b.B)
.Select(bG =>
new
{
B = bG.Key,
SubgroupC = bG
.GroupBy(c => c.C)
.Select(cG =>
new
{
C = cG.Key,
SubgroupD = cG.GroupBy(d => d.D)
})
})
});
What I'd like...
I can envisage a couple of ways that this could be expressed (assuming the language and framework supported it). The first would be a GroupBy
extension that takes a series of function pairs for key selection and result selection, Func<TElement, TKey>
and Func<TElement, TResult>
. Each pair describes the next sub-group. This option falls down because each pair would potentially require TKey
and TResult
to be different than the others, which would mean GroupBy
would need finite parameters and a complex declaration.
The second option would be a SubGroupBy
extension method that could be chained to produce sub-groups. SubGroupBy
would be the same as GroupBy
but the result would be the previous grouping further partitioned. For example:
var groupings = data
.GroupBy(x=>x.A)
.SubGroupBy(y=>y.B)
.SubGroupBy(z=>z.C)
// This version has a custom result type that would be the grouping data.
// The element data at each stage would be the custom data at this point
// as the original data would be lost when projected to the results type.
var groupingsWithCustomResultType = data
.GroupBy(a=>a.A, x=>new { ... })
.SubGroupBy(b=>b.B, y=>new { ... })
.SubGroupBy(c=>c.C, c=>new { ... })
The difficulty with this is how to implement the methods efficiently as with my current understanding, each level would re-create new objects in order to extend the previous objects. The first iteration would create groupings of A, the second would then create objects that have a key of A and groupings of B, the third would redo all that and add the groupings of C. This seems terribly inefficient (though I suspect my current options actually do this anyway). It would be nice if the calls passed around a meta-description of what was required and the instances were only created on the last pass, but that sounds difficult too. Note that his is similar to what can be done with GroupBy
but without the nested method calls.
Hopefully all that makes sense. I expect I am chasing rainbows here, but maybe not.
Update - another option
Another possibility that I think is more elegant than my previous suggestions relies on each parent group being just a key and a sequence of child items (as in the examples), much like IGrouping
provides now. That means one option for constructing this grouping would be a series of key selectors and a single results selector.
If the keys were all limited to a set type, which is not unreasonable, then this could be generated as a sequence of key selectors and a results selector, or a results selector and a params
of key selectors. Of course, if the keys had to be of different types and different levels, this becomes difficult again except for a finite depth of hierarchy due to the way generics parameterization works.
Here are some illustrative examples of what I mean:
For example:
public static /*<grouping type>*/ SubgroupBy(
IEnumerable<Func<TElement, TKey>> keySelectors,
this IEnumerable<TElement> sequence,
Func<TElement, TResult> resultSelector)
{
...
}
var hierarchy = data.SubgroupBy(
new [] {
x => x.A,
y => y.B,
z => z.C },
a => new { /*custom projection here for leaf items*/ })
Or:
public static /*<grouping type>*/ SubgroupBy(
this IEnumerable<TElement> sequence,
Func<TElement, TResult> resultSelector,
params Func<TElement, TKey>[] keySelectors)
{
...
}
var hierarchy = data.SubgroupBy(
a => new { /*custom projection here for leaf items*/ },
x => x.A,
y => y.B,
z => z.C)
This does not solve implementation inefficiencies, but it should solve the complex nesting. However, what would the return type of this grouping be? Would I need my own interface or can I use IGrouping
somehow. How much do I need to define or does the variable depth of the hierarchy still make this impossible?
My guess is that this should be the same as the return type from any IGrouping
call but how does the type system infer that type if it isn't involved in any of the parameters that are passed?
This problem is stretching my understanding, which is great, but my brain hurts.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这里是说明 如何实现分层分组机制。
根据此描述:
结果类:
扩展方法:
用法:
编辑:
这是一个改进且正确键入的版本的代码。
Here is a description how you can implement an hierarchical grouping mechanism.
From this description:
Result class:
Extension method:
Usage:
Edit:
Here is an improved and properly typed version of the code.
你需要一个递归函数。递归函数为树中的每个节点调用自身。
要在 Linq 中执行此操作,您可以 使用 Y-组合器。
You need a recursive function. The recursive function calls itself for each node in the tree.
To do this in Linq, you can use a Y-combinator.
这是我创建嵌套分组的尝试。可能有人觉得它有用。
基本用法:
请注意,分组属性必须以相反的顺序指定。这是由泛型限制引起的 -
GroupingBuilder
用新的分组类型包装先前的分组类型,因此只能以相反的顺序进行嵌套。与自定义结果选择器一起使用:
Here is my attempt to create nested grouping. May be someone find it useful.
Basic usage:
Note that grouped properties must be specified in reverse order. That's caused by restriction of generics -
GroupingBuilder<TKeyCurrent, TElementCurrent, TElementPrev, TElement>
wraps previous grouping type with new one, so nesting can be done only in reverse order.Usage with custom result selectors: