重构 - 速度提升
我怎样才能让这个功能更有效率。目前运行时间为 6 - 45 秒。 我已经在这个特定方法上运行了 dotTrace profiler,它的总时间在 6,000 毫秒到 45,000 毫秒之间。大部分时间花在“MoveNext”和“GetEnumerator”调用上。
举个例子,
71.55% CreateTableFromReportDataColumns - 18, 533* ms - 190 calls
-- 55.71% MoveNext - 14,422ms - 10,775 calls
我可以做些什么来加快这个方法的速度?它被多次调用,并且秒数加起来:
private static DataTable CreateTableFromReportDataColumns(Report report)
{
DataTable table = new DataTable();
HashSet<String> colsToAdd = new HashSet<String> { "DataStream" };
foreach (ReportData reportData in report.ReportDatas)
{
IEnumerable<string> cols = reportData.ReportDataColumns.Where(c => !String.IsNullOrEmpty(c.Name)).Select(x => x.Name).Distinct();
foreach (var s in cols)
{
if (!String.IsNullOrEmpty(s))
colsToAdd.Add(s);
}
}
foreach (string col in colsToAdd)
{
table.Columns.Add(col);
}
return table;
}
如果您需要此处的 sql 表定义,它们是:
ReportData
ReportID int
ReportDataColumn
ReportDataColumnId int
ReportDataId int
Name varchar(255)
Value text
How can I make this function more efficient. It's currently running at 6 - 45 seconds.
I've ran dotTrace profiler on this specific method, and it's total time is anywhere between 6,000ms to 45,000ms. The majority of the time is spent on the "MoveNext" and "GetEnumerator" calls.
and example of the times are
71.55% CreateTableFromReportDataColumns - 18, 533* ms - 190 calls
-- 55.71% MoveNext - 14,422ms - 10,775 calls
can I do to speed this method up? it gets called a lot, and the seconds add up:
private static DataTable CreateTableFromReportDataColumns(Report report)
{
DataTable table = new DataTable();
HashSet<String> colsToAdd = new HashSet<String> { "DataStream" };
foreach (ReportData reportData in report.ReportDatas)
{
IEnumerable<string> cols = reportData.ReportDataColumns.Where(c => !String.IsNullOrEmpty(c.Name)).Select(x => x.Name).Distinct();
foreach (var s in cols)
{
if (!String.IsNullOrEmpty(s))
colsToAdd.Add(s);
}
}
foreach (string col in colsToAdd)
{
table.Columns.Add(col);
}
return table;
}
If you need the sql table definitions here they are:
ReportData
ReportID int
ReportDataColumn
ReportDataColumnId int
ReportDataId int
Name varchar(255)
Value text
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我相信您应该能够将您的功能简化为这样的内容
,然后将名称添加到您的表中。
I believe you should be able to simplify your function into something like this
And from there add the names to your table.
您的代码(仅)运行 foreach 循环,因此该方法将大部分时间花在 MoveNext() 等中的结论并不令人惊讶。
您正在对 isnullOrEmpty 和 Distinct 进行双重工作(由 HashSet 重复)。
我的版本是:
但我不期望有巨大的改进
Your code (only) runs foreach loops so the conclusion that the method spends most of its time in MoveNext() et al is not so surprising.
You are doing double work on both the isnullOrEmpty and the Distinct (is repeated by the HashSet).
My version would be:
But I don't expect a huge improvement
当您提出问题时,您应该提到 LinqToSql,然后您会得到一些响应来查看您的数据库,看看它是否是一个长时间运行的查询或重复往返查询
此外,捕获由使用 sql 分析器。然后通过使用之前的这些语句运行查询来分析查询的 IO 和 TIME
最后,您可能需要一两个索引来降低 IO。列顺序在这里很重要。
You should have mentioned LinqToSql when you asked the question, then you would have gotten some responses to look into your database to see if it's a long running query or repeated round trip querying
Also, capture the query issued by using the sql profiler. Then analyze the IO and TIME of the query by running it with these statements before
Lastly, you may need an index or two to bring the IO down. Column order is important here.
这可能是对 Hank 代码的轻微改进。它利用了这样一个事实:HashSet 会告诉您添加操作是否成功或元素是否已存在。
编辑:我继续在开头将 null 和“”添加到哈希集中,因此我们不再需要检查 null 或empty。
This might be a slight improvement on Hank's code. It takes advantage of the fact that the HashSet will tell you if the Add operation was successful or the element already existed.
Edit: I went ahead and added null and "" to the hash set at the beginning, so we no longer need the check for null or empty.
这可能有点过分了(取决于 ReportDatas 中的条目数、每个 ReportDataColumns 中的列数、主机上的内核数等),但您也可以进行并行化。
例如,如果您决定并行处理 ReportDatas 条目,您可以让每个条目创建自己的列集合,或者将它们全部写入 ConcurrentBag 中,您可以在以下情况下区分它们:一切都完成了或者无论如何。
It might be overkill for this (depending on the number of entries in ReportDatas, number of columns in each ReportDataColumns, number of cores on host machine, etc) but you could also potentially parallelize.
If you decided to process the ReportDatas entries in parallel, for instance, you could have each one either create its own collection of columns or have them all write into a ConcurrentBag that you Distinct when it's all done or whatever.