重构 - 速度提升

发布于 2024-09-04 03:05:23 字数 1386 浏览 4 评论 0原文

我怎样才能让这个功能更有效率。目前运行时间为 6 - 45 秒。 我已经在这个特定方法上运行了 dotTrace profiler,它的总时间在 6,000 毫秒到 45,000 毫秒之间。大部分时间花在“MoveNext”和“GetEnumerator”调用上。

举个例子,

71.55% CreateTableFromReportDataColumns - 18, 533* ms - 190 calls
 -- 55.71% MoveNext - 14,422ms - 10,775 calls 

我可以做些什么来加快这个方法的速度?它被多次调用,并且秒数加起来:

    private static DataTable CreateTableFromReportDataColumns(Report report)
    {
        DataTable table = new DataTable();
        HashSet<String> colsToAdd = new HashSet<String> { "DataStream" };
        foreach (ReportData reportData in report.ReportDatas)
        {
            IEnumerable<string> cols = reportData.ReportDataColumns.Where(c => !String.IsNullOrEmpty(c.Name)).Select(x => x.Name).Distinct();

            foreach (var s in cols)
            {
                if (!String.IsNullOrEmpty(s))
                    colsToAdd.Add(s);
            }
        }

        foreach (string col in colsToAdd)
        {
            table.Columns.Add(col);
        }

        return table;
    }

如果您需要此处的 sql 表定义,它们是:

ReportData

ReportID            int

ReportDataColumn

ReportDataColumnId  int
ReportDataId        int 
Name                varchar(255)    
Value               text    

How can I make this function more efficient. It's currently running at 6 - 45 seconds.
I've ran dotTrace profiler on this specific method, and it's total time is anywhere between 6,000ms to 45,000ms. The majority of the time is spent on the "MoveNext" and "GetEnumerator" calls.

and example of the times are

71.55% CreateTableFromReportDataColumns - 18, 533* ms - 190 calls
 -- 55.71% MoveNext - 14,422ms - 10,775 calls 

can I do to speed this method up? it gets called a lot, and the seconds add up:

    private static DataTable CreateTableFromReportDataColumns(Report report)
    {
        DataTable table = new DataTable();
        HashSet<String> colsToAdd = new HashSet<String> { "DataStream" };
        foreach (ReportData reportData in report.ReportDatas)
        {
            IEnumerable<string> cols = reportData.ReportDataColumns.Where(c => !String.IsNullOrEmpty(c.Name)).Select(x => x.Name).Distinct();

            foreach (var s in cols)
            {
                if (!String.IsNullOrEmpty(s))
                    colsToAdd.Add(s);
            }
        }

        foreach (string col in colsToAdd)
        {
            table.Columns.Add(col);
        }

        return table;
    }

If you need the sql table definitions here they are:

ReportData

ReportID            int

ReportDataColumn

ReportDataColumnId  int
ReportDataId        int 
Name                varchar(255)    
Value               text    

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

止于盛夏 2024-09-11 03:05:23

我相信您应该能够将您的功能简化为这样的内容

var columnsToAdd = report.ReportDatas
                    .SelectMany(r => r.ReportDataColumns)
                    .Select(rdc => rdc.Name)
                    .Distinct()
                    .Where(name => !string.IsNullOrEmpty(name));

,然后将名称添加到您的表中。

I believe you should be able to simplify your function into something like this

var columnsToAdd = report.ReportDatas
                    .SelectMany(r => r.ReportDataColumns)
                    .Select(rdc => rdc.Name)
                    .Distinct()
                    .Where(name => !string.IsNullOrEmpty(name));

And from there add the names to your table.

遥远的绿洲 2024-09-11 03:05:23

您的代码(仅)运行 foreach 循环,因此该方法将大部分时间花在 MoveNext() 等中的结论并不令人惊讶。

您正在对 isnullOrEmpty 和 Distinct 进行双重工作(由 HashSet 重复)。

我的版本是:

private static DataTable CreateTableFromReportDataColumns(Report report)
{
    DataTable table = new DataTable();
    HashSet<String> colsToAdd = new HashSet<String> { "DataStream" };
    foreach (ReportData reportData in report.ReportDatas)
    {

        foreach (var column in reportData.ReportDataColumns)
        {
            if (!String.IsNullOrEmpty(column.Name))
                colsToAdd.Add(column.Name);
        }
    }

    foreach (string col in colsToAdd)
    {
        table.Columns.Add(col);
    }

    return table;
}

但我不期望有巨大的改进

Your code (only) runs foreach loops so the conclusion that the method spends most of its time in MoveNext() et al is not so surprising.

You are doing double work on both the isnullOrEmpty and the Distinct (is repeated by the HashSet).

My version would be:

private static DataTable CreateTableFromReportDataColumns(Report report)
{
    DataTable table = new DataTable();
    HashSet<String> colsToAdd = new HashSet<String> { "DataStream" };
    foreach (ReportData reportData in report.ReportDatas)
    {

        foreach (var column in reportData.ReportDataColumns)
        {
            if (!String.IsNullOrEmpty(column.Name))
                colsToAdd.Add(column.Name);
        }
    }

    foreach (string col in colsToAdd)
    {
        table.Columns.Add(col);
    }

    return table;
}

But I don't expect a huge improvement

物价感观 2024-09-11 03:05:23

当您提出问题时,您应该提到 LinqToSql,然后您会得到一些响应来查看您的数据库,看看它是否是一个长时间运行的查询或重复往返查询

private static DataTable CreateTableFromReportDataColumns(Report report) 
{ 
    DataTable table = new DataTable(); 
    table.Columns.Add("DataStream");
    IEnumerable<string> moreColumns = report.ReportDatas
      .SelectMany(z => z.ReportDataColumns)
      .Select(x => x.Name)
      .Where(s => s != null && s != "")
      .Distinct();

    foreach (string col in moreColumns) 
    { 
        table.Columns.Add(col); 
    } 

    return table; 
} 

此外,捕获由使用 sql 分析器。然后通过使用之前的这些语句运行查询来分析查询的 IO 和 TIME

SET STATISTICS TIME ON
SET STATISTICS IO ON
  --your query here

最后,您可能需要一两个索引来降低 IO。列顺序在这里很重要。

CREATE INDEX IX1_ReportData ON ReportData(ReportID, Id)
CREATE INDEX IX1_ReportDataColumn ON ReportDataColumn(ReportDataId, Name)

You should have mentioned LinqToSql when you asked the question, then you would have gotten some responses to look into your database to see if it's a long running query or repeated round trip querying

private static DataTable CreateTableFromReportDataColumns(Report report) 
{ 
    DataTable table = new DataTable(); 
    table.Columns.Add("DataStream");
    IEnumerable<string> moreColumns = report.ReportDatas
      .SelectMany(z => z.ReportDataColumns)
      .Select(x => x.Name)
      .Where(s => s != null && s != "")
      .Distinct();

    foreach (string col in moreColumns) 
    { 
        table.Columns.Add(col); 
    } 

    return table; 
} 

Also, capture the query issued by using the sql profiler. Then analyze the IO and TIME of the query by running it with these statements before

SET STATISTICS TIME ON
SET STATISTICS IO ON
  --your query here

Lastly, you may need an index or two to bring the IO down. Column order is important here.

CREATE INDEX IX1_ReportData ON ReportData(ReportID, Id)
CREATE INDEX IX1_ReportDataColumn ON ReportDataColumn(ReportDataId, Name)
一场信仰旅途 2024-09-11 03:05:23

这可能是对 Hank 代码的轻微改进。它利用了这样一个事实:HashSet 会告诉您添加操作是否成功或元素是否已存在。

private static DataTable CreateTableFromReportDataColumns(Report report)
{
    HashSet<string> uniqueNames = new HashSet<string> { null, "", "DataStream" };

    DataTable table = new DataTable();
    table.Columns.Add("DataStream");

    foreach (ReportData reportData in report.ReportDatas)
    {
        foreach (var dataColumn in reportData.ReportDataColumns)
        {
            if (uniqueNames.Add(dataColumn.Name))
            {
                table.Columns.Add(dataColumn.Name);
            }
        }
    }

    return table;
}

编辑:我继续在开头将 null 和“”添加到哈希集中,因此我们不再需要检查 null 或empty。

This might be a slight improvement on Hank's code. It takes advantage of the fact that the HashSet will tell you if the Add operation was successful or the element already existed.

private static DataTable CreateTableFromReportDataColumns(Report report)
{
    HashSet<string> uniqueNames = new HashSet<string> { null, "", "DataStream" };

    DataTable table = new DataTable();
    table.Columns.Add("DataStream");

    foreach (ReportData reportData in report.ReportDatas)
    {
        foreach (var dataColumn in reportData.ReportDataColumns)
        {
            if (uniqueNames.Add(dataColumn.Name))
            {
                table.Columns.Add(dataColumn.Name);
            }
        }
    }

    return table;
}

Edit: I went ahead and added null and "" to the hash set at the beginning, so we no longer need the check for null or empty.

过度放纵 2024-09-11 03:05:23
  • 重复 string.isnullorempty 检查,
  • 您可以通过执行 SelectMany 来摆脱 foreach(我看到安东尼刚刚发布了相同的:),
  • 以保持“DataStream”列的相同语义(切换到安东尼的版本后),您可以执行 new HashSet(columnsToAdd) { "DataStream" } 但添加(通过 concat 或 union 或其他方式)“DataStream”字符串然后用 Distinct() 结果可能会更容易/更快,并避免创建 HashSet(也可以分析两者) )

这可能有点过分了(取决于 ReportDatas 中的条目数、每个 ReportDataColumns 中的列数、主机上的内核数等),但您也可以进行并行化。

例如,如果您决定并行处理 ReportDatas 条目,您可以让每个条目创建自己的列集合,或者将它们全部写入 ConcurrentBag 中,您可以在以下情况下区分它们:一切都完成了或者无论如何。

  • the string.isnullorempty check is repeated
  • you can get rid of the foreach's by doing SelectMany (i see Anthony just posted the same :)
  • to keep the same semantics for the "DataStream" column (after switching to Anthony's version), you could do new HashSet(columnsToAdd) { "DataStream" } but it might be easier/faster to just add (via concat or union or whatever) the "DataStream" string and then Distinct() the result and avoid the HashSet creation (might as well profile both)

It might be overkill for this (depending on the number of entries in ReportDatas, number of columns in each ReportDataColumns, number of cores on host machine, etc) but you could also potentially parallelize.

If you decided to process the ReportDatas entries in parallel, for instance, you could have each one either create its own collection of columns or have them all write into a ConcurrentBag that you Distinct when it's all done or whatever.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文