基于 .Net 的建模应用程序的速度改进
我们正在.Net 中开发一种股票建模分析工具。
该工具的主要目标是运行模型 5 年,并对各种产品的未来进货、出货和库存进行预测。
代码的主要工作流程是 1. 从数据库中获取数据。 2. 对于每个日期 过程数据(运行生产和库存模型) 3. 遍历完所有日期后,将所有数据一起更新到数据库中。
因此,主要只有两次数据库调用,最初我们获取数据集中的所有数据,然后在 Ram 中对其进行处理,并且不进行数据库调用。
我们面临的问题是,运行一个模型一年需要将近一个小时。我们的基准是在 5 分钟内运行模型 5 年。
我们已经研究这个问题近一个月了。目前我们已经能够实现10分钟运行模型1年。以下是我们发现的情况。 - 从数据集中获取数据时,如果表包含所有五年数据,则很难进行影响,因此我们将数据集按月循环划分,现在我们一次运行一个月的模型。这给了我们最大的速度提升。 - 尝试减少每天运行的模型内的 for 循环。这并没有给我们带来太大的改进。
您可以从以下链接下载一个 rar 文件。 http://dl.dropbox.com/u/4546390/iPlanner.rar
它包含三个文件。
iPlanner Tables.xls:给出了数据库设计的想法。 iPlanner Logic.xls :讨论表格以及生产模型、运输模型和实际值处理的逻辑。我认为最重要的是看生产模型,这会让你大致了解模型日常的用途。
Common.cs :它具有调用生产模型功能,一切都从这里开始。你也可以检查一下。
该模型以前是用 Excel 编写的,过去 5 年只需要 2 分钟。迁移到 .Net 的原因是为了拥有更多共享功能并拥有类似软件的外观。
我正在尝试找出可以改进这一点的方法。
如果需要更多信息,请告诉我。
提前致谢
We are developing one anayltical tool of stock moddeling in .Net.
The primary objective of the tool is to Run a Model for 5 years and do projections for future In, Out and Stock for various Products.
The primary workflow of the code is
1. Fetch the data from database.
2. For each date
Process Data (Run Production and Stock Model)
3. After All the dates are traversed update all the data together in database.
So primarily there are only two database calls and initially we take all the data in datasets and then we process it in Ram and do not make database calls.
The problem we faced that it was taking almost an hour to run a model for 1 year. Our benchmark is to run the model for 5 years in 5 minutes.
We have been working on this problem for almost a month now. Right now we have been able to achieve running model for 1 year in 10 mins. Following are the things that we have found out.
- While fetching data from the data set if the tables carrry all five years data it was difficult to fecth so we divided the data sets in monthly loops and now we run model for a month at a time. This has given us the maximum improvement in speed.
- Tried to reduce for loops inside the model which runs daily. This did not give us much improvement.
You can download one rar file from following link.
http://dl.dropbox.com/u/4546390/iPlanner.rar
It contains three file.
iPlanner Tables.xls : which is giving idea of database design.
iPlanner Logic.xls : talks about table and the logic of production model, shipment model and actual value handling. I think the most important is to look at the production model, this will give you a brief idea of what the model does daily.
Common.cs : which has the Call Production Model function from where everything starts. You can check that out too.
The model was previously written in excel in excel it used to take 2 mins for 5 years. The reason to move to .Net is to have more sharing features and have a software like look.
I am trying to find out the ways in which this can be improved.
Let me know if more information is required on this.
Thanks in Advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果对每个日期进行的计算是独立的,这听起来像是映射/归约的一个很好的应用。您对并行计算的想法进行了多少探索? 60 个 Hadoop 处理器(五年内每个月一个)可以快速完成任务。
If the calculations done for each date are independent, this sounds like a good application of map/reduce. How much have you explored the idea of parallelizing this calculation? Sixty Hadoop processors, one for each month in the five-year window, could make short work of it.
首先:profile ;p
我接下来要尝试的是将
DataTable
从系统中取出,转而采用与您的数据完全匹配的强类型类。虽然数据加载速度不是问题,但我会使用 dapper-dot-net 之类的东西来尽可能高效地加载数据。使用
DataTable
,每个成员访问都是间接的,并且必须通过内部查找来进行,可能涉及途中的装箱。通过使用静态绑定到实际数据属性(几乎总是内联到字段)来消除所有这些。不幸的是,这有点难以衡量作为估计的影响,因为它是不平凡的。First: profile ;p
The next thing I'd try is taking
DataTable
out of the system, in favor of strongly typed classes that exactly match your data. Although data-load speed isn't the problem, I'd use something like dapper-dot-net to make loading the data as efficient as possible.With a
DataTable
, every member access is indirect, and has to come via an internal lookup, possibly involving boxing en-route. Cut all of that out by using static binding to the actual data properties (which are almost always inlined to the fields). Unfortuantely, this is a bit hard to measure as an estimated impact, as is non-trivial.