IEnumerable 作为 DataTable 性能问题

发布于 2024-11-29 19:06:17 字数 869 浏览 1 评论 0原文

我有以下扩展,它从 IEnumerable 生成 DataTable

    public static DataTable AsDataTable<T>(this IEnumerable<T> enumerable)
    {
        DataTable table = new DataTable();

        T first = enumerable.FirstOrDefault();
        if (first == null)
            return table;

        PropertyInfo[] properties = first.GetType().GetProperties();
        foreach (PropertyInfo pi in properties)
            table.Columns.Add(pi.Name, pi.PropertyType);

        foreach (T t in enumerable)
        {
            DataRow row = table.NewRow();
            foreach (PropertyInfo pi in properties)
                row[pi.Name] = t.GetType().InvokeMember(pi.Name, BindingFlags.GetProperty, null, t, null);
            table.Rows.Add(row);
        }

        return table;
    }

但是,对于大量数据,性能不是很好。是否有任何我看不到的明显性能修复?

I have the following extension, which generates a DataTable from an IEnumerable:

    public static DataTable AsDataTable<T>(this IEnumerable<T> enumerable)
    {
        DataTable table = new DataTable();

        T first = enumerable.FirstOrDefault();
        if (first == null)
            return table;

        PropertyInfo[] properties = first.GetType().GetProperties();
        foreach (PropertyInfo pi in properties)
            table.Columns.Add(pi.Name, pi.PropertyType);

        foreach (T t in enumerable)
        {
            DataRow row = table.NewRow();
            foreach (PropertyInfo pi in properties)
                row[pi.Name] = t.GetType().InvokeMember(pi.Name, BindingFlags.GetProperty, null, t, null);
            table.Rows.Add(row);
        }

        return table;
    }

However, on huge amounts of data, the performance isn't very good. Is there any obvious performance fixes that I'm unable to see?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

雨落□心尘 2024-12-06 19:06:17

首先,有几个非性能问题:

  1. 可枚举中第一项的类型可能是 T 的子类,它定义了其他项上可能不存在的属性。为了避免这可能导致的问题,请使用 T 类型作为属性列表的源。
  2. 该类型可能具有没有 getter 或具有索引 getter 的属性。您的代码不应尝试读取它们的值。

在性能方面,我可以看到反射和数据表加载方面的潜在改进:

  1. 缓存属性 getter 并直接调用它们。
  2. 避免通过名称访问数据行列来设置行值。
  3. 添加行时将数据表置于“数据加载”模式。

使用这些模组,您最终会得到如下所示的结果:

public static DataTable AsDataTable<T>(this IEnumerable<T> enumerable)
{
    if (enumerable == null)
    {
        throw new ArgumentNullException("enumerable");
    }

    DataTable table = new DataTable();
    if (enumerable.Any())
    {
        IList<PropertyInfo> properties = typeof(T)
                                            .GetProperties()
                                            .Where(p => p.CanRead && (p.GetIndexParameters().Length == 0))
                                            .ToList();

        foreach (PropertyInfo property in properties)
        {
            table.Columns.Add(property.Name, property.PropertyType);
        }

        IList<MethodInfo> getters = properties.Select(p => p.GetGetMethod()).ToList();

        table.BeginLoadData();
        try
        {
            object[] values = new object[properties.Count];
            foreach (T item in enumerable)
            {
                for (int i = 0; i < getters.Count; i++)
                {
                    values[i] = getters[i].Invoke(item, BindingFlags.Default, null, null, CultureInfo.InvariantCulture);
                }

                table.Rows.Add(values);
            }
        }
        finally
        {
            table.EndLoadData();
        }
    }

    return table;
}

First, a couple of non-perf problems:

  1. The type of the first item in the enumerable might be a subclass of T that defines properties that might not be present on other items. To avoid problems that this may cause, use the T type as the source of the properties list.
  2. The type might have properties that either have no getter or that have an indexed getter. Your code should not attempt to read their values.

On the perf side of things, I can see potential improvements on both the reflection and the data table loading sides of things:

  1. Cache the property getters and invoke them directly.
  2. Avoid accessing the data row columns by name to set the row values.
  3. Place the data table in "data loading" mode while adding the rows.

With these mods, you would end up with something like the following:

public static DataTable AsDataTable<T>(this IEnumerable<T> enumerable)
{
    if (enumerable == null)
    {
        throw new ArgumentNullException("enumerable");
    }

    DataTable table = new DataTable();
    if (enumerable.Any())
    {
        IList<PropertyInfo> properties = typeof(T)
                                            .GetProperties()
                                            .Where(p => p.CanRead && (p.GetIndexParameters().Length == 0))
                                            .ToList();

        foreach (PropertyInfo property in properties)
        {
            table.Columns.Add(property.Name, property.PropertyType);
        }

        IList<MethodInfo> getters = properties.Select(p => p.GetGetMethod()).ToList();

        table.BeginLoadData();
        try
        {
            object[] values = new object[properties.Count];
            foreach (T item in enumerable)
            {
                for (int i = 0; i < getters.Count; i++)
                {
                    values[i] = getters[i].Invoke(item, BindingFlags.Default, null, null, CultureInfo.InvariantCulture);
                }

                table.Rows.Add(values);
            }
        }
        finally
        {
            table.EndLoadData();
        }
    }

    return table;
}
呆头 2024-12-06 19:06:17

而不是做:

row[pi.Name] = t.GetType().InvokeMember(pi.Name, BindingFlags.GetProperty, null, t, null);

使用:

row[pi.Name] = pi.GetValue(t, null);

Instead of doing:

row[pi.Name] = t.GetType().InvokeMember(pi.Name, BindingFlags.GetProperty, null, t, null);

use:

row[pi.Name] = pi.GetValue(t, null);
番薯 2024-12-06 19:06:17

您始终可以使用 Fasterflect 之类的库来发出 IL,而不是对中每个项目的每个属性使用 true Reflection列表。不确定 DataTable 是否存在任何问题。

或者,如果此代码不尝试成为通用解决方案,您始终可以将 IEnumerable 中的任何类型将其自身转换为 DataRow,从而避免反射。

You could always use a library like Fasterflect to emit IL instead of using true Reflection for every property on every item in the list. Not sure about any gotcha's with the DataTable.

Alternatively, if this code is not trying to be a generic solution, you could always have whatever type is within the IEnumerable translate itself to a DataRow, thus avoiding reflection all together.

北方。的韩爷 2024-12-06 19:06:17

您可能对此没有选择,但可以查看代码的体系结构,看看是否可以避免使用 DataTable 而自己返回 IEnumerable

这样做的主要原因是:

  1. 您将从 IEnumerable 转换为 DataTable,这实际上是从流式操作转换为缓冲操作.

    • 流式传输:使用yield return,以便仅在需要时才从枚举中提取结果。它不会像 foreach

    • 那样一次性迭代整个集合

    • 缓冲:将所有结果拉入内存(例如填充的集合、数据表或数组),因此所有费用都会立即产生。

  2. 如果您可以使用 IEnumerable 返回类型,那么您可以自己使用 yield return 关键字,这意味着您可以分散所有反射的成本,而不是一次性产生全部成本。< /p>

You may not have a choice about this, but possibly look at the architecture of the code to see if you can avoid using a DataTable and rather return an IEnumerable<T> yourself.

Main reason(s) for doing that would be:

  1. You are going from an IEnumerable to a DataTable, which is effectively going from a streamed operation to a buffered operation.

    • Streamed: uses yield return so that results are only pulled off the enumeration as-and-when they are needed. It does not iterate the whole collection at once like a foreach

    • Buffered: pulls all of the results into memory (e.g. a populated collection, datatable or array) so all of the expense is incurred at once.

  2. If you can use an IEnumerable return type, then you can make use of the yield return keyword yourself, meaning you spread the cost of all that reflection out instead of incurring it all at once.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文