性能问题 - 大型 DLL 和大型命名空间

发布于 2024-09-13 06:19:00 字数 5420 浏览 6 评论 0原文

这是我在此处提出的 LINQ to DB2 问题的下一步。

根据 zb_z 的回答，我浏览了一些 DB_Linq 并成功添加了有效的 DB2 支持。（它现在仍处于起步阶段，尚未准备好回馈该项目。）概念验证效果很好，实际上非常令人兴奋。然而，我一路上又遇到了麻烦。

事实证明，我们的 DB2 数据库很大。 8,306张桌子。因此生成的代码超过 520 万行。在一个文件中。不用说，Visual Studio 不太关心它:)

因此我进一步修改了生成器，将每个表类吐出到它自己的文件中。这给我留下了 8,307 个文件（数据上下文和每个表一个，它们使用表属性扩展数据上下文）。 Visual Studio 仍然不喜欢它，这是可以理解的，所以我将代码生成和编译封装在一个脚本中，然后运行它来输出一个 DLL 供我的项目使用。

36 MB DLL。

现在，对性能进行了一些搜索，让我找到了这个SO问题（它本身引用了这个），我已经关注了答案和链接，看看它们是什么'再说。因此，这让我想知道是否同一命名空间中存在超过 8,000 个类，这才是造成明显性能问题的罪魁祸首。

我的性能测试是编写一个小型控制台应用程序，用于初始化数据上下文、使用 LINQ 获取数据、打印出行计数、使用经典 ADO 获取数据并打印出另一个行计数。每个语句都包含一个时间戳。添加更多查询来测试等总是会产生相同的性能。 LINQ 代码需要几秒钟才能运行，而 ADO 眨眼间即可填充数据集。

所以我想这最终会成为一个有点开放式（而且冗长，抱歉）的问题。有人对这里加快性能有什么想法吗？有什么可以简单调整的，或者我可以应用的设计考虑因素？

编辑

一些需要注意的事情：

如果我将代码生成限制为表的子集（例如 200 个），那么它的运行速度会快得多。
在调试器中单步执行，时间长度花在 var foo = from t in Bank1.TMX9800F where t.T9ADDEP > 行上。 0 选择 t.T9ADDEP ，当我在调试器中展开属性以枚举结果（或让它转到执行 .Count() 的下一行）时，该部分根本不需要时间。

编辑

我无法发布整个生成的 DLL，但这里是测试应用程序的代码：

static void Main(string[] args)
        {
            Console.WriteLine(string.Format("{0}: Process Started", DateTime.Now.ToLongTimeString()));

            // Initialize your data contexts
            var bank1 = new BNKPRD01(new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString));
            var bank6 = new BNKPRD06(new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString));
            Console.WriteLine(string.Format("{0}: Data contexts initialized", DateTime.Now.ToLongTimeString()));

            var foo = from t in bank1.TMX9800F where t.T9ADDEP > 0 select t; // <- runs slow
            Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD01 test table", DateTime.Now.ToLongTimeString(), foo.Count().ToString()));

            var baz = from t in bank6.TMX9800F where t.T9ADDEP > 0 select t; // <- runs slow
            Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD06 test table", DateTime.Now.ToLongTimeString(), baz.Count().ToString()));

            var ds = new DataSet();
            using (var conn = new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString))
            {
                using (var cmd = conn.CreateCommand())
                {
                    cmd.CommandText = "SELECT * FROM BNKPRD01.TMX9800F WHERE T9ADDEP > 0";
                    new IBM.Data.DB2.iSeries.iDB2DataAdapter(cmd).Fill(ds);
                }
            }
            Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD01 test table", DateTime.Now.ToLongTimeString(), ds.Tables[0].Rows.Count.ToString()));

            ds = new DataSet();
            using (var conn = new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString))
            {
                using (var cmd = conn.CreateCommand())
                {
                    cmd.CommandText = "SELECT * FROM BNKPRD06.TMX9800F WHERE T9ADDEP > 0";
                    new IBM.Data.DB2.iSeries.iDB2DataAdapter(cmd).Fill(ds);
                }
            }
            Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD06 test table", DateTime.Now.ToLongTimeString(), ds.Tables[0].Rows.Count.ToString()));

            Console.WriteLine("Press return to exit.");
            Console.ReadLine();
        }

也许我遗漏了一些明显的东西，或者有一些关于 LINQ 的东西我没有理解？

编辑

根据下面与 Jon 和 Brian 的讨论，我进一步研究了创建 LINQ 查询时调用的 DB_Linq 代码，并遇到了长步骤：

public override IEnumerable<MetaTable> GetTables()
        {
            const BindingFlags scope = BindingFlags.GetField |
                BindingFlags.GetProperty | BindingFlags.Static |
                BindingFlags.Instance | BindingFlags.NonPublic |
                BindingFlags.Public;
            var seen = new HashSet<Type>();
            foreach (var info in _ContextType.GetMembers(scope))
            {
                // Only look for Fields & Properties.
                if (info.MemberType != MemberTypes.Field && info.MemberType != MemberTypes.Property)
                    continue;
                Type memberType = info.GetMemberType();

                if (memberType == null || !memberType.IsGenericType ||
                        memberType.GetGenericTypeDefinition() != typeof(Table<>))
                    continue;
                var tableType = memberType.GetGenericArguments()[0];
                if (tableType.IsGenericParameter)
                    continue;
                if (seen.Contains(tableType))
                    continue;
                seen.Add(tableType);

                MetaTable metaTable;
                if (_Tables.TryGetValue(tableType, out metaTable))
                  yield return metaTable;
                else
                  yield return AddTableType(tableType);
            }
        }

该循环迭代了 16,718 次。

原文

This is sort of the next step of the LINQ to DB2 question I asked here.

Following zb_z's answer, I poked around a bit with the code for DB_Linq and have managed to add working DB2 support. (It's still in its infancy now, not ready to be contributed back to the project yet.) The proof of concept worked great, it was pretty exciting actually. However, I've run into another hiccup along the way.

As it turns out, our DB2 database is big. 8,306 tables big. So the code that was generated turned out to be over 5.2 million lines of code. In one file. Needless to say, Visual Studio didn't much care for it :)

So I further modified the generator to spit out each table class into its own file. This left me with 8,307 files (the data context and one for each table, which extend the data context with table properties). Visual Studio still didn't like it, understandably, so I wrapped up the code generation and compilation in a script and just run that to output a DLL for my projects to use.

A 36 MB DLL.

Now, searching around a bit on performance led me to this SO question (which itself references this one) and I've followed the answers and the links and see what they're saying. So this leads me to wonder if its perhaps the existence of over 8,000 classes within the same namespace that's the culprit of noticeable performance issues.

My test for performance was to write a little console app that initializes the data context, grabs the data with LINQ, prints out a row count, grabs the data with classic ADO, and prints out another row count. Each statement includes a time stamp. Adding more queries to test, etc. always results in the same performance. The LINQ code takes several seconds to run, while the ADO fills the dataset in the blink of an eye.

So I guess this ends up being a somewhat open-ended (and long-winded, sorry about that) question. Does anybody have any ideas on speeding up performance here? Anything simple to tweak, or design considerations I could apply?

EDIT

A few things to note:

If I restrict the code generation to a subset of tables (say, 200) then it runs much faster.
Stepping through in the debugger, the length of time is spent on the line var foo = from t in bank1.TMX9800F where t.T9ADDEP > 0 select t.T9ADDEP and when I expand the property in the debugger to enumerate the results (or let it go to the next line which does a .Count()) then that part takes no time at all.

EDIT

I can't post the entire generated DLLs, but here's the code for the test app:

static void Main(string[] args)
        {
            Console.WriteLine(string.Format("{0}: Process Started", DateTime.Now.ToLongTimeString()));

            // Initialize your data contexts
            var bank1 = new BNKPRD01(new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString));
            var bank6 = new BNKPRD06(new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString));
            Console.WriteLine(string.Format("{0}: Data contexts initialized", DateTime.Now.ToLongTimeString()));

            var foo = from t in bank1.TMX9800F where t.T9ADDEP > 0 select t; // <- runs slow
            Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD01 test table", DateTime.Now.ToLongTimeString(), foo.Count().ToString()));

            var baz = from t in bank6.TMX9800F where t.T9ADDEP > 0 select t; // <- runs slow
            Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD06 test table", DateTime.Now.ToLongTimeString(), baz.Count().ToString()));

            var ds = new DataSet();
            using (var conn = new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString))
            {
                using (var cmd = conn.CreateCommand())
                {
                    cmd.CommandText = "SELECT * FROM BNKPRD01.TMX9800F WHERE T9ADDEP > 0";
                    new IBM.Data.DB2.iSeries.iDB2DataAdapter(cmd).Fill(ds);
                }
            }
            Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD01 test table", DateTime.Now.ToLongTimeString(), ds.Tables[0].Rows.Count.ToString()));

            ds = new DataSet();
            using (var conn = new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString))
            {
                using (var cmd = conn.CreateCommand())
                {
                    cmd.CommandText = "SELECT * FROM BNKPRD06.TMX9800F WHERE T9ADDEP > 0";
                    new IBM.Data.DB2.iSeries.iDB2DataAdapter(cmd).Fill(ds);
                }
            }
            Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD06 test table", DateTime.Now.ToLongTimeString(), ds.Tables[0].Rows.Count.ToString()));

            Console.WriteLine("Press return to exit.");
            Console.ReadLine();
        }

Maybe I'm missing something obvious or there's something about LINQ I didn't grok?

EDIT

Upon discussion with Jon and Brian below, I've stepped further into the DB_Linq code that gets called when the LINQ query is created and came across the long step:

public override IEnumerable<MetaTable> GetTables()
        {
            const BindingFlags scope = BindingFlags.GetField |
                BindingFlags.GetProperty | BindingFlags.Static |
                BindingFlags.Instance | BindingFlags.NonPublic |
                BindingFlags.Public;
            var seen = new HashSet<Type>();
            foreach (var info in _ContextType.GetMembers(scope))
            {
                // Only look for Fields & Properties.
                if (info.MemberType != MemberTypes.Field && info.MemberType != MemberTypes.Property)
                    continue;
                Type memberType = info.GetMemberType();

                if (memberType == null || !memberType.IsGenericType ||
                        memberType.GetGenericTypeDefinition() != typeof(Table<>))
                    continue;
                var tableType = memberType.GetGenericArguments()[0];
                if (tableType.IsGenericParameter)
                    continue;
                if (seen.Contains(tableType))
                    continue;
                seen.Add(tableType);

                MetaTable metaTable;
                if (_Tables.TryGetValue(tableType, out metaTable))
                  yield return metaTable;
                else
                  yield return AddTableType(tableType);
            }
        }

That loop iterates 16,718 times.

分享到QQ

分享到微博