性能问题 - 大型 DLL 和大型命名空间
这是我在此处提出的 LINQ to DB2 问题的下一步。
根据 zb_z 的回答,我浏览了一些 DB_Linq 并成功添加了有效的 DB2 支持。 (它现在仍处于起步阶段,尚未准备好回馈该项目。)概念验证效果很好,实际上非常令人兴奋。然而,我一路上又遇到了麻烦。
事实证明,我们的 DB2 数据库很大。 8,306张桌子。因此生成的代码超过 520 万行。在一个文件中。不用说,Visual Studio 不太关心它:)
因此我进一步修改了生成器,将每个表类吐出到它自己的文件中。这给我留下了 8,307 个文件(数据上下文和每个表一个,它们使用表属性扩展数据上下文)。 Visual Studio 仍然不喜欢它,这是可以理解的,所以我将代码生成和编译封装在一个脚本中,然后运行它来输出一个 DLL 供我的项目使用。
36 MB DLL。
现在,对性能进行了一些搜索,让我找到了这个SO问题(它本身引用了这个),我已经关注了答案和链接,看看它们是什么'再说。因此,这让我想知道是否同一命名空间中存在超过 8,000 个类,这才是造成明显性能问题的罪魁祸首。
我的性能测试是编写一个小型控制台应用程序,用于初始化数据上下文、使用 LINQ 获取数据、打印出行计数、使用经典 ADO 获取数据并打印出另一个行计数。每个语句都包含一个时间戳。添加更多查询来测试等总是会产生相同的性能。 LINQ 代码需要几秒钟才能运行,而 ADO 眨眼间即可填充数据集。
所以我想这最终会成为一个有点开放式(而且冗长,抱歉)的问题。有人对这里加快性能有什么想法吗?有什么可以简单调整的,或者我可以应用的设计考虑因素?
编辑
一些需要注意的事情:
- 如果我将代码生成限制为表的子集(例如 200 个),那么它的运行速度会快得多。
- 在调试器中单步执行,时间长度花在
var foo = from t in Bank1.TMX9800F where t.T9ADDEP > 行上。 0 选择 t.T9ADDEP
,当我在调试器中展开属性以枚举结果(或让它转到执行 .Count() 的下一行)时,该部分根本不需要时间。
编辑
我无法发布整个生成的 DLL,但这里是测试应用程序的代码:
static void Main(string[] args)
{
Console.WriteLine(string.Format("{0}: Process Started", DateTime.Now.ToLongTimeString()));
// Initialize your data contexts
var bank1 = new BNKPRD01(new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString));
var bank6 = new BNKPRD06(new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString));
Console.WriteLine(string.Format("{0}: Data contexts initialized", DateTime.Now.ToLongTimeString()));
var foo = from t in bank1.TMX9800F where t.T9ADDEP > 0 select t; // <- runs slow
Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD01 test table", DateTime.Now.ToLongTimeString(), foo.Count().ToString()));
var baz = from t in bank6.TMX9800F where t.T9ADDEP > 0 select t; // <- runs slow
Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD06 test table", DateTime.Now.ToLongTimeString(), baz.Count().ToString()));
var ds = new DataSet();
using (var conn = new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString))
{
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "SELECT * FROM BNKPRD01.TMX9800F WHERE T9ADDEP > 0";
new IBM.Data.DB2.iSeries.iDB2DataAdapter(cmd).Fill(ds);
}
}
Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD01 test table", DateTime.Now.ToLongTimeString(), ds.Tables[0].Rows.Count.ToString()));
ds = new DataSet();
using (var conn = new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString))
{
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "SELECT * FROM BNKPRD06.TMX9800F WHERE T9ADDEP > 0";
new IBM.Data.DB2.iSeries.iDB2DataAdapter(cmd).Fill(ds);
}
}
Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD06 test table", DateTime.Now.ToLongTimeString(), ds.Tables[0].Rows.Count.ToString()));
Console.WriteLine("Press return to exit.");
Console.ReadLine();
}
也许我遗漏了一些明显的东西,或者有一些关于 LINQ 的东西我没有理解?
编辑
根据下面与 Jon 和 Brian 的讨论,我进一步研究了创建 LINQ 查询时调用的 DB_Linq 代码,并遇到了长步骤:
public override IEnumerable<MetaTable> GetTables()
{
const BindingFlags scope = BindingFlags.GetField |
BindingFlags.GetProperty | BindingFlags.Static |
BindingFlags.Instance | BindingFlags.NonPublic |
BindingFlags.Public;
var seen = new HashSet<Type>();
foreach (var info in _ContextType.GetMembers(scope))
{
// Only look for Fields & Properties.
if (info.MemberType != MemberTypes.Field && info.MemberType != MemberTypes.Property)
continue;
Type memberType = info.GetMemberType();
if (memberType == null || !memberType.IsGenericType ||
memberType.GetGenericTypeDefinition() != typeof(Table<>))
continue;
var tableType = memberType.GetGenericArguments()[0];
if (tableType.IsGenericParameter)
continue;
if (seen.Contains(tableType))
continue;
seen.Add(tableType);
MetaTable metaTable;
if (_Tables.TryGetValue(tableType, out metaTable))
yield return metaTable;
else
yield return AddTableType(tableType);
}
}
该循环迭代了 16,718 次。
This is sort of the next step of the LINQ to DB2 question I asked here.
Following zb_z's answer, I poked around a bit with the code for DB_Linq and have managed to add working DB2 support. (It's still in its infancy now, not ready to be contributed back to the project yet.) The proof of concept worked great, it was pretty exciting actually. However, I've run into another hiccup along the way.
As it turns out, our DB2 database is big. 8,306 tables big. So the code that was generated turned out to be over 5.2 million lines of code. In one file. Needless to say, Visual Studio didn't much care for it :)
So I further modified the generator to spit out each table class into its own file. This left me with 8,307 files (the data context and one for each table, which extend the data context with table properties). Visual Studio still didn't like it, understandably, so I wrapped up the code generation and compilation in a script and just run that to output a DLL for my projects to use.
A 36 MB DLL.
Now, searching around a bit on performance led me to this SO question (which itself references this one) and I've followed the answers and the links and see what they're saying. So this leads me to wonder if its perhaps the existence of over 8,000 classes within the same namespace that's the culprit of noticeable performance issues.
My test for performance was to write a little console app that initializes the data context, grabs the data with LINQ, prints out a row count, grabs the data with classic ADO, and prints out another row count. Each statement includes a time stamp. Adding more queries to test, etc. always results in the same performance. The LINQ code takes several seconds to run, while the ADO fills the dataset in the blink of an eye.
So I guess this ends up being a somewhat open-ended (and long-winded, sorry about that) question. Does anybody have any ideas on speeding up performance here? Anything simple to tweak, or design considerations I could apply?
EDIT
A few things to note:
- If I restrict the code generation to a subset of tables (say, 200) then it runs much faster.
- Stepping through in the debugger, the length of time is spent on the line
var foo = from t in bank1.TMX9800F where t.T9ADDEP > 0 select t.T9ADDEP
and when I expand the property in the debugger to enumerate the results (or let it go to the next line which does a .Count()) then that part takes no time at all.
EDIT
I can't post the entire generated DLLs, but here's the code for the test app:
static void Main(string[] args)
{
Console.WriteLine(string.Format("{0}: Process Started", DateTime.Now.ToLongTimeString()));
// Initialize your data contexts
var bank1 = new BNKPRD01(new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString));
var bank6 = new BNKPRD06(new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString));
Console.WriteLine(string.Format("{0}: Data contexts initialized", DateTime.Now.ToLongTimeString()));
var foo = from t in bank1.TMX9800F where t.T9ADDEP > 0 select t; // <- runs slow
Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD01 test table", DateTime.Now.ToLongTimeString(), foo.Count().ToString()));
var baz = from t in bank6.TMX9800F where t.T9ADDEP > 0 select t; // <- runs slow
Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD06 test table", DateTime.Now.ToLongTimeString(), baz.Count().ToString()));
var ds = new DataSet();
using (var conn = new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString))
{
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "SELECT * FROM BNKPRD01.TMX9800F WHERE T9ADDEP > 0";
new IBM.Data.DB2.iSeries.iDB2DataAdapter(cmd).Fill(ds);
}
}
Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD01 test table", DateTime.Now.ToLongTimeString(), ds.Tables[0].Rows.Count.ToString()));
ds = new DataSet();
using (var conn = new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString))
{
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "SELECT * FROM BNKPRD06.TMX9800F WHERE T9ADDEP > 0";
new IBM.Data.DB2.iSeries.iDB2DataAdapter(cmd).Fill(ds);
}
}
Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD06 test table", DateTime.Now.ToLongTimeString(), ds.Tables[0].Rows.Count.ToString()));
Console.WriteLine("Press return to exit.");
Console.ReadLine();
}
Maybe I'm missing something obvious or there's something about LINQ I didn't grok?
EDIT
Upon discussion with Jon and Brian below, I've stepped further into the DB_Linq code that gets called when the LINQ query is created and came across the long step:
public override IEnumerable<MetaTable> GetTables()
{
const BindingFlags scope = BindingFlags.GetField |
BindingFlags.GetProperty | BindingFlags.Static |
BindingFlags.Instance | BindingFlags.NonPublic |
BindingFlags.Public;
var seen = new HashSet<Type>();
foreach (var info in _ContextType.GetMembers(scope))
{
// Only look for Fields & Properties.
if (info.MemberType != MemberTypes.Field && info.MemberType != MemberTypes.Property)
continue;
Type memberType = info.GetMemberType();
if (memberType == null || !memberType.IsGenericType ||
memberType.GetGenericTypeDefinition() != typeof(Table<>))
continue;
var tableType = memberType.GetGenericArguments()[0];
if (tableType.IsGenericParameter)
continue;
if (seen.Contains(tableType))
continue;
seen.Add(tableType);
MetaTable metaTable;
if (_Tables.TryGetValue(tableType, out metaTable))
yield return metaTable;
else
yield return AddTableType(tableType);
}
}
That loop iterates 16,718 times.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我刚刚创建了一个小型测试项目,在命名空间中包含 10.000 个类,虽然在加载/抖动程序集时有明显的开销,但我不会说它特别慢。因此,您所看到的性能不佳的原因可能不是类本身的数量。
我是乔恩,如果您能提供有关您的测试应用程序的更多信息,将会很有帮助。
I just created a small test project with 10.000 classes in a namespace and while there's a noticeable overhead when loading/jitting the assembly I wouldn't say that it is particularly slow. So it is probably not the number of classes itself that's the reason for the bad performance you're seeing.
I'm Jon here, it would be helpful with more info on your test app.
发布控制台应用程序确实会有帮助。
在命名空间和程序集中拥有许多类会减慢编译速度,并且每种类型中的每个方法都会一次性执行 JITting...但我不希望它会减慢向下 LINQ 查询。
您应该检查 LINQ 查询实际生成的 SQL 是什么。我希望问题就在那里。
Posting the console app would really help.
Having many classes in a namespace and in an assembly will slow down compilation and there will be a one-time hit of JITting for each method in each type... but I wouldn't expect it to slow down LINQ queries.
You should check what SQL is actually being generated from your LINQ queries. I would expect the problem to lie there.