LINQ 是否使用 DataRelations 来优化联接?

发布于 2024-07-09 23:43:42 字数 481 浏览 5 评论 0原文

我无法在任何地方找到这个问题的答案,在我开始使用 Reflector 浏览生成的代码之前,我认为值得一问:

假设我对数据集中的数据表运行以下 LINQ 查询:

var list = 
   from pr in parentTable.AsEnumerable()
   join cr in childTable.AsEnumerable() on cr.Field<int>("ParentID") equals pr.Field<int>("ID")
   where pr.Field<string>("Value") == "foo"
   select cr;

如果父表和使用所示关键字段的子表,LINQ会使用它吗? 也就是说,它会在父表中找到 Value 为“foo”的行,然后调用 GetChildRows 来投影子行吗?

或者这是我必须明确指定的事情? (如果是这样,我该怎么做?)

I can't find the answer to this anywhere, and before I start pawing through generated code with Reflector I thought it'd be worth asking:

Suppose I have the following LINQ query run against DataTables in a DataSet:

var list = 
   from pr in parentTable.AsEnumerable()
   join cr in childTable.AsEnumerable() on cr.Field<int>("ParentID") equals pr.Field<int>("ID")
   where pr.Field<string>("Value") == "foo"
   select cr;

If there's a DataRelation between the parent table and the child table that uses the key fields shown, will LINQ use it? That is, will it find the rows in the parent table for which Value is "foo" and then call GetChildRows to project the child rows?

Or is this something that I have to specify explicitly? (And if so, how do I do this?)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

演多会厌 2024-07-16 23:43:42

在 Reflector 中进行挖掘似乎没有给出任何迹象表明 LINQ 使用 DataRelations,但该代码非常难以阅读。 因此,我编写了一些性能测试,除非我错过了该测试中的某些愚蠢内容,否则结果非常明确:不,LINQ 不使用 DataRelations 和 GetChildRows() 来投影连接的行。 如果要迭代子行,则必须制定 LINQ 查询以显式调用 GetChildRows()。 这两种方法的性能都不及编写迭代 GetChildRows() 返回的数组的代码。

有点不幸,因为大型数据集上的性能差异非常显着,以至于用显式实现的代码替换 LINQ 通常是值得的,但 LINQ 通常并非如此。

我的测试代码如下。 无论 DataRelation 是在其之前还是之后创建,使用联接的 LINQ 迭代的时间保持不变(在我的计算机上大约为 580-590 毫秒)。 使用 GetChildRows() 的 LINQ 迭代大约需要 280 毫秒,

直接通过 GetChildRows() 的迭代需要不到一毫秒。 这让我非常惊讶 - 足以让我在第一次运行测试时认为代码中存在错误。 (这就是为什么我每次都写出计数 - 以确保编译器没有优化循环。)

class Program
{
    static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();
        DataSet ds = new DataSet();
        DataTable t1 = new DataTable();
        t1.Columns.Add(new DataColumn
                           {
                               ColumnName = "ID",
                               DataType = typeof (int),
                               AutoIncrement = true
                           });
        t1.PrimaryKey = new [] { t1.Columns["ID"]};
        ds.Tables.Add(t1);

        DataTable t2 = new DataTable();
        t2.Columns.Add(new DataColumn
        {
            ColumnName = "ID",
            DataType = typeof(int),
            AutoIncrement = true
        });
        t2.Columns.Add("ParentID", typeof(int));
        t2.PrimaryKey = new[] { t2.Columns["ID"] };
        ds.Tables.Add(t2);

        sw.Reset();
        sw.Start();
        PopulateTables(t1, t2);
        sw.Stop();
        Console.WriteLine("Populating tables took {0} ms.", sw.ElapsedMilliseconds);
        Console.WriteLine();

        var list1 = from r1 in t1.AsEnumerable()
                   join r2 in t2.AsEnumerable()
                       on r1.Field<int>("ID") equals r2.Field<int>("ParentID")
                   where r1.Field<int>("ID") == 1
                   select r2;

        sw.Reset();
        sw.Start();
        int count = 0;
        foreach (DataRow r in list1)
        {
            count += r.Field<int>("ID");
        }
        sw.Stop();
        Console.WriteLine("count = {0}.", count);
        Console.WriteLine("Completed LINQ iteration in {0} ms.", sw.ElapsedMilliseconds);
        Console.WriteLine();

        sw.Reset();
        sw.Start();
        ds.Relations.Add(new DataRelation("FK_t2_t1", t1.Columns["ID"], t2.Columns["ParentID"]));
        sw.Stop();
        Console.WriteLine("Creating DataRelation took {0} ms.", sw.ElapsedMilliseconds);

        sw.Reset();
        sw.Start();
        var list2 =
            from r1 in t1.AsEnumerable()
            from r2 in r1.GetChildRows("FK_t2_t1")
            where r1.Field<int>("ID") == 1
            select r2;

        count = 0;
        foreach (DataRow r in list2)
        {
            count += r.Field<int>("ID");
        }
        sw.Stop();
        Console.WriteLine("count = {0}.", count);
        Console.WriteLine("Completed LINQ iteration using nested query in {0} ms.", sw.ElapsedMilliseconds);
        Console.WriteLine();

        sw.Reset();
        sw.Start();
        DataRow parentRow = t1.Select("ID = 1")[0];
        count = 0;
        foreach (DataRow r in parentRow.GetChildRows("FK_t2_t1"))
        {
            count += r.Field<int>("ID");
        }
        sw.Stop();
        Console.WriteLine("count = {0}.", count);
        Console.WriteLine("Completed explicit iteration of child rows in {0} ms.", sw.ElapsedMilliseconds);
        Console.WriteLine();

        Console.ReadLine();
    }

    private static void PopulateTables(DataTable t1, DataTable t2)
    {
        for (int count1 = 0; count1 < 1000; count1++)
        {
            DataRow r1 = t1.NewRow();
            t1.Rows.Add(r1);
            for (int count2 = 0; count2 < 1000; count2++)
            {
                DataRow r2 = t2.NewRow();
                r2["ParentID"] = r1["ID"];
                t2.Rows.Add(r2);
            }
        }
    }
} 

Digging around in Reflector didn't seem to give any indication that LINQ uses DataRelations, but that code's awfully hard to read. So I wrote a little performance test, and unless there's something dumb about the test that I missed, the results are pretty definitive: no, LINQ doesn't use DataRelations and GetChildRows() to project joined rows. If you want to iterate over child rows, you have to formulate the LINQ query to call GetChildRows() explicitly. And neither approach is anywhere near as performant as writing code that iterates over the array returned by GetChildRows().

Kind of unfortunate, because the difference in performance on large DataSets is significant enough that replacing LINQ with explicitly-implemented code is often going to be worthwhile, which generally isn't true of LINQ.

My test code is below. The timing of the LINQ iteration using the join remains the same (about 580-590 ms on my machine) irrespective of whether the DataRelation is created before or after it. The LINQ iteration that uses GetChildRows() takes around 280 ms,

The iteration directly over GetChildRows() takes under a millisecond. That's pretty surprising to me - enough so that I assumed I had a bug in the code when I first ran the test. (That's why I'm writing out the count each time - to make sure that the loops haven't been optimized out of existence by the compiler.)

class Program
{
    static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();
        DataSet ds = new DataSet();
        DataTable t1 = new DataTable();
        t1.Columns.Add(new DataColumn
                           {
                               ColumnName = "ID",
                               DataType = typeof (int),
                               AutoIncrement = true
                           });
        t1.PrimaryKey = new [] { t1.Columns["ID"]};
        ds.Tables.Add(t1);

        DataTable t2 = new DataTable();
        t2.Columns.Add(new DataColumn
        {
            ColumnName = "ID",
            DataType = typeof(int),
            AutoIncrement = true
        });
        t2.Columns.Add("ParentID", typeof(int));
        t2.PrimaryKey = new[] { t2.Columns["ID"] };
        ds.Tables.Add(t2);

        sw.Reset();
        sw.Start();
        PopulateTables(t1, t2);
        sw.Stop();
        Console.WriteLine("Populating tables took {0} ms.", sw.ElapsedMilliseconds);
        Console.WriteLine();

        var list1 = from r1 in t1.AsEnumerable()
                   join r2 in t2.AsEnumerable()
                       on r1.Field<int>("ID") equals r2.Field<int>("ParentID")
                   where r1.Field<int>("ID") == 1
                   select r2;

        sw.Reset();
        sw.Start();
        int count = 0;
        foreach (DataRow r in list1)
        {
            count += r.Field<int>("ID");
        }
        sw.Stop();
        Console.WriteLine("count = {0}.", count);
        Console.WriteLine("Completed LINQ iteration in {0} ms.", sw.ElapsedMilliseconds);
        Console.WriteLine();

        sw.Reset();
        sw.Start();
        ds.Relations.Add(new DataRelation("FK_t2_t1", t1.Columns["ID"], t2.Columns["ParentID"]));
        sw.Stop();
        Console.WriteLine("Creating DataRelation took {0} ms.", sw.ElapsedMilliseconds);

        sw.Reset();
        sw.Start();
        var list2 =
            from r1 in t1.AsEnumerable()
            from r2 in r1.GetChildRows("FK_t2_t1")
            where r1.Field<int>("ID") == 1
            select r2;

        count = 0;
        foreach (DataRow r in list2)
        {
            count += r.Field<int>("ID");
        }
        sw.Stop();
        Console.WriteLine("count = {0}.", count);
        Console.WriteLine("Completed LINQ iteration using nested query in {0} ms.", sw.ElapsedMilliseconds);
        Console.WriteLine();

        sw.Reset();
        sw.Start();
        DataRow parentRow = t1.Select("ID = 1")[0];
        count = 0;
        foreach (DataRow r in parentRow.GetChildRows("FK_t2_t1"))
        {
            count += r.Field<int>("ID");
        }
        sw.Stop();
        Console.WriteLine("count = {0}.", count);
        Console.WriteLine("Completed explicit iteration of child rows in {0} ms.", sw.ElapsedMilliseconds);
        Console.WriteLine();

        Console.ReadLine();
    }

    private static void PopulateTables(DataTable t1, DataTable t2)
    {
        for (int count1 = 0; count1 < 1000; count1++)
        {
            DataRow r1 = t1.NewRow();
            t1.Rows.Add(r1);
            for (int count2 = 0; count2 < 1000; count2++)
            {
                DataRow r2 = t2.NewRow();
                r2["ParentID"] = r1["ID"];
                t2.Rows.Add(r2);
            }
        }
    }
} 
风筝在阴天搁浅。 2024-07-16 23:43:42

我不这么认为。 在这种情况下,LINQ to Objects 可能只会将两侧视为常规可枚举对象,并手动执行连接(而不查看DataRelation)。

I don't think so. In this case, LINQ to Objects will probably just treat the two sides as regular enumerable objects, and do the join manually (without looking at the DataRelation).

绝影如岚 2024-07-16 23:43:42

这是一个解释:
http://msdn.microsoft.com/en-us/library/bb386969。 aspx

但在上面的示例中,您使用“JOIN”语法告诉 linq 有关父子关系的信息。 如果可能的话,我建议您创建一个强类型数据集,所有关系和连接都将为您处理。

This is an explaination:
http://msdn.microsoft.com/en-us/library/bb386969.aspx

But in your example above you're telling linq about the parent child relationship with the "JOIN" syntax. I would recommend you make a strongly-typed dataset if possible, and all the relationships and joining will be handled for you.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文