是否可以使用 LINQ 透视数据?

发布于 2024-07-07 08:46:04 字数 404 浏览 5 评论 0原文

我想知道是否可以使用 LINQ 将数据从以下布局转换

CustID | OrderDate | Qty
1      | 1/1/2008  | 100
2      | 1/2/2008  | 200
1      | 2/2/2008  | 350
2      | 2/28/2008 | 221
1      | 3/12/2008 | 250
2      | 3/15/2008 | 2150

为类似这样的内容:

CustID  | Jan- 2008 | Feb- 2008 | Mar - 2008 |
1       | 100       | 350       |  250
2       | 200       | 221       | 2150

I am wondering if it is possible to use LINQ to pivot data from the following layout:

CustID | OrderDate | Qty
1      | 1/1/2008  | 100
2      | 1/2/2008  | 200
1      | 2/2/2008  | 350
2      | 2/28/2008 | 221
1      | 3/12/2008 | 250
2      | 3/15/2008 | 2150

into something like this:

CustID  | Jan- 2008 | Feb- 2008 | Mar - 2008 |
1       | 100       | 350       |  250
2       | 200       | 221       | 2150

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

迷爱 2024-07-14 08:46:04

像这样的东西吗?

List<CustData> myList = GetCustData();

var query = myList
    .GroupBy(c => c.CustId)
    .Select(g => new {
        CustId = g.Key,
        Jan = g.Where(c => c.OrderDate.Month == 1).Sum(c => c.Qty),
        Feb = g.Where(c => c.OrderDate.Month == 2).Sum(c => c.Qty),
        March = g.Where(c => c.OrderDate.Month == 3).Sum(c => c.Qty)
    });

Linq 中的GroupBy 的工作方式与 SQL 不同。 在 SQL 中,您可以获得键和聚合(行/列形状)。 在 Linq 中,您将获得键以及作为键子级的任何元素(分层形状)。 要进行透视,您必须将层次结构投影回您选择的行/列形式。

Something like this?

List<CustData> myList = GetCustData();

var query = myList
    .GroupBy(c => c.CustId)
    .Select(g => new {
        CustId = g.Key,
        Jan = g.Where(c => c.OrderDate.Month == 1).Sum(c => c.Qty),
        Feb = g.Where(c => c.OrderDate.Month == 2).Sum(c => c.Qty),
        March = g.Where(c => c.OrderDate.Month == 3).Sum(c => c.Qty)
    });

GroupBy in Linq does not work the same as SQL. In SQL, you get the key and aggregates (row/column shape). In Linq, you get the key and any elements as children of the key (hierarchical shape). To pivot, you must project the hierarchy back into a row/column form of your choosing.

影子的影子 2024-07-14 08:46:04

我使用 linq 扩展方法回答了类似问题

// order s(ource) by OrderDate to have proper column ordering
var r = s.Pivot3(e => e.custID, e => e.OrderDate.ToString("MMM-yyyy")
    , lst => lst.Sum(e => e.Qty));
// order r(esult) by CustID

(+) 通用实现
(-) 肯定比 Amy B 慢

谁能改进我的实现(即该方法对列和行进行排序)?

I answered similar question using linq extension method:

// order s(ource) by OrderDate to have proper column ordering
var r = s.Pivot3(e => e.custID, e => e.OrderDate.ToString("MMM-yyyy")
    , lst => lst.Sum(e => e.Qty));
// order r(esult) by CustID

(+) generic implementation
(-) definitely slower than Amy B's

Can anyone improve my implementation (i.e. the method does the ordering of columns & rows)?

坏尐絯℡ 2024-07-14 08:46:04

我认为最巧妙的方法是使用查找:

var query =
    from c in myList
    group c by c.CustId into gcs
    let lookup = gcs.ToLookup(y => y.OrderDate.Month, y => y.Qty)
    select new
    {
        CustId = gcs.Key,
        Jan = lookup[1].Sum(),
        Feb = lookup[2].Sum(),
        Mar = lookup[3].Sum(),
    };

The neatest approach for this, I think, is to use a lookup:

var query =
    from c in myList
    group c by c.CustId into gcs
    let lookup = gcs.ToLookup(y => y.OrderDate.Month, y => y.Qty)
    select new
    {
        CustId = gcs.Key,
        Jan = lookup[1].Sum(),
        Feb = lookup[2].Sum(),
        Mar = lookup[3].Sum(),
    };
花期渐远 2024-07-14 08:46:04

下面是如何使用 LINQ 转换数据的更通用的方法:

IEnumerable<CustData> s;
var groupedData = s.ToLookup( 
        k => new ValueKey(
            k.CustID, // 1st dimension
            String.Format("{0}-{1}", k.OrderDate.Month, k.OrderDate.Year // 2nd dimension
        ) ) );
var rowKeys = groupedData.Select(g => (int)g.Key.DimKeys[0]).Distinct().OrderBy(k=>k);
var columnKeys = groupedData.Select(g => (string)g.Key.DimKeys[1]).Distinct().OrderBy(k=>k);
foreach (var row in rowKeys) {
    Console.Write("CustID {0}: ", row);
    foreach (var column in columnKeys) {
        Console.Write("{0:####} ", groupedData[new ValueKey(row,column)].Sum(r=>r.Qty) );
    }
    Console.WriteLine();
}

其中 ValueKey 是表示多维键的特殊类:

public sealed class ValueKey {
    public readonly object[] DimKeys;
    public ValueKey(params object[] dimKeys) {
        DimKeys = dimKeys;
    }
    public override int GetHashCode() {
        if (DimKeys==null) return 0;
        int hashCode = DimKeys.Length;
        for (int i = 0; i < DimKeys.Length; i++) { 
            hashCode ^= DimKeys[i].GetHashCode();
        }
        return hashCode;
    }
    public override bool Equals(object obj) {
        if ( obj==null || !(obj is ValueKey))
            return false;
        var x = DimKeys;
        var y = ((ValueKey)obj).DimKeys;
        if (ReferenceEquals(x,y))
            return true;
        if (x.Length!=y.Length)
            return false;
        for (int i = 0; i < x.Length; i++) {
            if (!x[i].Equals(y[i]))
                return false;
        }
        return true;            
    }
}

此方法可用于按 N 维 (n>2) 进行分组,并且对于相当小的数据集可以很好地工作。 对于大型数据集(最多 100 万条记录及更多)或无法对枢轴配置进行硬编码的情况,我编写了特殊的 PivotData 库(免费):

var pvtData = new PivotData(new []{"CustID","OrderDate"}, new SumAggregatorFactory("Qty"));
pvtData.ProcessData(s, (o, f) => {
    var custData = (TT)o;
    switch (f) {
        case "CustID": return custData.CustID;
        case "OrderDate": 
        return String.Format("{0}-{1}", custData.OrderDate.Month, custData.OrderDate.Year);
        case "Qty": return custData.Qty;
    }
    return null;
} );
Console.WriteLine( pvtData[1, "1-2008"].Value );  

Here is a bit more generic way how to pivot data using LINQ:

IEnumerable<CustData> s;
var groupedData = s.ToLookup( 
        k => new ValueKey(
            k.CustID, // 1st dimension
            String.Format("{0}-{1}", k.OrderDate.Month, k.OrderDate.Year // 2nd dimension
        ) ) );
var rowKeys = groupedData.Select(g => (int)g.Key.DimKeys[0]).Distinct().OrderBy(k=>k);
var columnKeys = groupedData.Select(g => (string)g.Key.DimKeys[1]).Distinct().OrderBy(k=>k);
foreach (var row in rowKeys) {
    Console.Write("CustID {0}: ", row);
    foreach (var column in columnKeys) {
        Console.Write("{0:####} ", groupedData[new ValueKey(row,column)].Sum(r=>r.Qty) );
    }
    Console.WriteLine();
}

where ValueKey is a special class that represents multidimensional key:

public sealed class ValueKey {
    public readonly object[] DimKeys;
    public ValueKey(params object[] dimKeys) {
        DimKeys = dimKeys;
    }
    public override int GetHashCode() {
        if (DimKeys==null) return 0;
        int hashCode = DimKeys.Length;
        for (int i = 0; i < DimKeys.Length; i++) { 
            hashCode ^= DimKeys[i].GetHashCode();
        }
        return hashCode;
    }
    public override bool Equals(object obj) {
        if ( obj==null || !(obj is ValueKey))
            return false;
        var x = DimKeys;
        var y = ((ValueKey)obj).DimKeys;
        if (ReferenceEquals(x,y))
            return true;
        if (x.Length!=y.Length)
            return false;
        for (int i = 0; i < x.Length; i++) {
            if (!x[i].Equals(y[i]))
                return false;
        }
        return true;            
    }
}

This approach can be used for grouping by N-dimensions (n>2) and will work fine for rather small datasets. For large datasets (up to 1 mln of records and more) or for cases when pivot configuration cannot be hardcoded I've written special PivotData library (it is free):

var pvtData = new PivotData(new []{"CustID","OrderDate"}, new SumAggregatorFactory("Qty"));
pvtData.ProcessData(s, (o, f) => {
    var custData = (TT)o;
    switch (f) {
        case "CustID": return custData.CustID;
        case "OrderDate": 
        return String.Format("{0}-{1}", custData.OrderDate.Month, custData.OrderDate.Year);
        case "Qty": return custData.Qty;
    }
    return null;
} );
Console.WriteLine( pvtData[1, "1-2008"].Value );  
九歌凝 2024-07-14 08:46:04
// LINQPad Code for Amy B answer
void Main()
{
    List<CustData> myList = GetCustData();
    
    var query = myList
        .GroupBy(c => c.CustId)
        .Select(g => new
        {
            CustId = g.Key,
            Jan = g.Where(c => c.OrderDate.Month == 1).Sum(c => c.Qty),
            Feb = g.Where(c => c.OrderDate.Month == 2).Sum(c => c.Qty),
            March = g.Where(c => c.OrderDate.Month == 3).Sum(c => c.Qty),
            //April = g.Where(c => c.OrderDate.Month == 4).Sum(c => c.Qty),
            //May = g.Where(c => c.OrderDate.Month == 5).Sum(c => c.Qty),
            //June = g.Where(c => c.OrderDate.Month == 6).Sum(c => c.Qty),
            //July = g.Where(c => c.OrderDate.Month == 7).Sum(c => c.Qty),
            //August = g.Where(c => c.OrderDate.Month == 8).Sum(c => c.Qty),
            //September = g.Where(c => c.OrderDate.Month == 9).Sum(c => c.Qty),
            //October = g.Where(c => c.OrderDate.Month == 10).Sum(c => c.Qty),
            //November = g.Where(c => c.OrderDate.Month == 11).Sum(c => c.Qty),
            //December = g.Where(c => c.OrderDate.Month == 12).Sum(c => c.Qty)          
        });
        
    
    query.Dump();
}

/// <summary>
/// --------------------------------
/// CustID  | OrderDate     | Qty
/// --------------------------------
/// 1       | 1 / 1 / 2008  | 100
/// 2       | 1 / 2 / 2008  | 200
/// 1       | 2 / 2 / 2008  | 350
/// 2       | 2 / 28 / 2008 | 221
/// 1       | 3 / 12 / 2008 | 250
/// 2       | 3 / 15 / 2008 | 2150 
/// </ summary>
public List<CustData> GetCustData()
{
    List<CustData> custData = new List<CustData>
    {
        new CustData
        {
            CustId = 1,
            OrderDate = new DateTime(2008, 1, 1),
            Qty = 100
        },

        new CustData
        {
            CustId = 2,
            OrderDate = new DateTime(2008, 1, 2),
            Qty = 200
        },

        new CustData
        {
            CustId = 1,
            OrderDate = new DateTime(2008, 2, 2),
            Qty = 350
        },

        new CustData
        {
            CustId = 2,
            OrderDate = new DateTime(2008, 2, 28),
            Qty = 221
        },

        new CustData
        {
            CustId = 1,
            OrderDate = new DateTime(2008, 3, 12),
            Qty = 250
        },

        new CustData
        {
            CustId = 2,
            OrderDate = new DateTime(2008, 3, 15),
            Qty = 2150
        },      
    };

    return custData;
}

public class CustData
{
    public int CustId;
    public DateTime OrderDate;
    public uint Qty;
}

输入图片此处描述

// LINQPad Code for Amy B answer
void Main()
{
    List<CustData> myList = GetCustData();
    
    var query = myList
        .GroupBy(c => c.CustId)
        .Select(g => new
        {
            CustId = g.Key,
            Jan = g.Where(c => c.OrderDate.Month == 1).Sum(c => c.Qty),
            Feb = g.Where(c => c.OrderDate.Month == 2).Sum(c => c.Qty),
            March = g.Where(c => c.OrderDate.Month == 3).Sum(c => c.Qty),
            //April = g.Where(c => c.OrderDate.Month == 4).Sum(c => c.Qty),
            //May = g.Where(c => c.OrderDate.Month == 5).Sum(c => c.Qty),
            //June = g.Where(c => c.OrderDate.Month == 6).Sum(c => c.Qty),
            //July = g.Where(c => c.OrderDate.Month == 7).Sum(c => c.Qty),
            //August = g.Where(c => c.OrderDate.Month == 8).Sum(c => c.Qty),
            //September = g.Where(c => c.OrderDate.Month == 9).Sum(c => c.Qty),
            //October = g.Where(c => c.OrderDate.Month == 10).Sum(c => c.Qty),
            //November = g.Where(c => c.OrderDate.Month == 11).Sum(c => c.Qty),
            //December = g.Where(c => c.OrderDate.Month == 12).Sum(c => c.Qty)          
        });
        
    
    query.Dump();
}

/// <summary>
/// --------------------------------
/// CustID  | OrderDate     | Qty
/// --------------------------------
/// 1       | 1 / 1 / 2008  | 100
/// 2       | 1 / 2 / 2008  | 200
/// 1       | 2 / 2 / 2008  | 350
/// 2       | 2 / 28 / 2008 | 221
/// 1       | 3 / 12 / 2008 | 250
/// 2       | 3 / 15 / 2008 | 2150 
/// </ summary>
public List<CustData> GetCustData()
{
    List<CustData> custData = new List<CustData>
    {
        new CustData
        {
            CustId = 1,
            OrderDate = new DateTime(2008, 1, 1),
            Qty = 100
        },

        new CustData
        {
            CustId = 2,
            OrderDate = new DateTime(2008, 1, 2),
            Qty = 200
        },

        new CustData
        {
            CustId = 1,
            OrderDate = new DateTime(2008, 2, 2),
            Qty = 350
        },

        new CustData
        {
            CustId = 2,
            OrderDate = new DateTime(2008, 2, 28),
            Qty = 221
        },

        new CustData
        {
            CustId = 1,
            OrderDate = new DateTime(2008, 3, 12),
            Qty = 250
        },

        new CustData
        {
            CustId = 2,
            OrderDate = new DateTime(2008, 3, 15),
            Qty = 2150
        },      
    };

    return custData;
}

public class CustData
{
    public int CustId;
    public DateTime OrderDate;
    public uint Qty;
}

enter image description here

苦妄 2024-07-14 08:46:04

这是最有效的方法:

检查以下方法。 而不是每个月每次都迭代客户组。

var query = myList
    .GroupBy(c => c.CustId)
    .Select(g => {
        var results = new CustomerStatistics();
        foreach (var customer in g)
        {
            switch (customer.OrderDate.Month)
            {
                case 1:
                    results.Jan += customer.Qty;
                    break;
                case 2:
                    results.Feb += customer.Qty;
                    break;
                case 3:
                    results.March += customer.Qty;
                    break;
                default:
                    break;
            }
        }
        return  new
        {
            CustId = g.Key,
            results.Jan,
            results.Feb,
            results.March
        };
    });

或者这个:

var query = myList
    .GroupBy(c => c.CustId)
    .Select(g => {
        var results = g.Aggregate(new CustomerStatistics(), (result, customer) => result.Accumulate(customer), customerStatistics => customerStatistics.Compute());
        return  new
        {
            CustId = g.Key,
            results.Jan,
            results.Feb,
            results.March
        };
    });

完整的解决方案:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApp
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            IEnumerable<CustData> myList = GetCustData().Take(100);

            var query = myList
                .GroupBy(c => c.CustId)
                .Select(g =>
                {
                    CustomerStatistics results = g.Aggregate(new CustomerStatistics(), (result, customer) => result.Accumulate(customer), customerStatistics => customerStatistics.Compute());
                    return new
                    {
                        CustId = g.Key,
                        results.Jan,
                        results.Feb,
                        results.March
                    };
                });
            Console.ReadKey();
        }

        private static IEnumerable<CustData> GetCustData()
        {
            Random random = new Random();
            int custId = 0;
            while (true)
            {
                custId++;
                yield return new CustData { CustId = custId, OrderDate = new DateTime(2018, random.Next(1, 4), 1), Qty = random.Next(1, 50) };
            }
        }

    }
    public class CustData
    {
        public int CustId { get; set; }
        public DateTime OrderDate { get; set; }
        public int Qty { get; set; }
    }
    public class CustomerStatistics
    {
        public int Jan { get; set; }
        public int Feb { get; set; }
        public int March { get; set; }
        internal CustomerStatistics Accumulate(CustData customer)
        {
            switch (customer.OrderDate.Month)
            {
                case 1:
                    Jan += customer.Qty;
                    break;
                case 2:
                    Feb += customer.Qty;
                    break;
                case 3:
                    March += customer.Qty;
                    break;
                default:
                    break;
            }
            return this;
        }
        public CustomerStatistics Compute()
        {
            return this;
        }
    }
}

This is most efficient way:

Check the following approach. Instead of iterating through the customers group each time for each month.

var query = myList
    .GroupBy(c => c.CustId)
    .Select(g => {
        var results = new CustomerStatistics();
        foreach (var customer in g)
        {
            switch (customer.OrderDate.Month)
            {
                case 1:
                    results.Jan += customer.Qty;
                    break;
                case 2:
                    results.Feb += customer.Qty;
                    break;
                case 3:
                    results.March += customer.Qty;
                    break;
                default:
                    break;
            }
        }
        return  new
        {
            CustId = g.Key,
            results.Jan,
            results.Feb,
            results.March
        };
    });

Or this one :

var query = myList
    .GroupBy(c => c.CustId)
    .Select(g => {
        var results = g.Aggregate(new CustomerStatistics(), (result, customer) => result.Accumulate(customer), customerStatistics => customerStatistics.Compute());
        return  new
        {
            CustId = g.Key,
            results.Jan,
            results.Feb,
            results.March
        };
    });

Complete solution:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApp
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            IEnumerable<CustData> myList = GetCustData().Take(100);

            var query = myList
                .GroupBy(c => c.CustId)
                .Select(g =>
                {
                    CustomerStatistics results = g.Aggregate(new CustomerStatistics(), (result, customer) => result.Accumulate(customer), customerStatistics => customerStatistics.Compute());
                    return new
                    {
                        CustId = g.Key,
                        results.Jan,
                        results.Feb,
                        results.March
                    };
                });
            Console.ReadKey();
        }

        private static IEnumerable<CustData> GetCustData()
        {
            Random random = new Random();
            int custId = 0;
            while (true)
            {
                custId++;
                yield return new CustData { CustId = custId, OrderDate = new DateTime(2018, random.Next(1, 4), 1), Qty = random.Next(1, 50) };
            }
        }

    }
    public class CustData
    {
        public int CustId { get; set; }
        public DateTime OrderDate { get; set; }
        public int Qty { get; set; }
    }
    public class CustomerStatistics
    {
        public int Jan { get; set; }
        public int Feb { get; set; }
        public int March { get; set; }
        internal CustomerStatistics Accumulate(CustData customer)
        {
            switch (customer.OrderDate.Month)
            {
                case 1:
                    Jan += customer.Qty;
                    break;
                case 2:
                    Feb += customer.Qty;
                    break;
                case 3:
                    March += customer.Qty;
                    break;
                default:
                    break;
            }
            return this;
        }
        public CustomerStatistics Compute()
        {
            return this;
        }
    }
}
清醇 2024-07-14 08:46:04

按月份对数据进行分组,然后将其投影到包含每个月的列的新数据表中。 新表将是您的数据透视表。

Group your data on month, and then project it into a new datatable with columns for each month. The new table would be your pivot table.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文