使用 LINQ 获取两列的重复项

发布于 12-06 05:05 字数 2746 浏览 0 评论 0原文

LINQ 让我发疯。为什么以下查询不返回重复项,而它只适用于一个标识符?我的错误在哪里?

' generate some test-data '
Dim source As New DataTable
source.Columns.Add(New DataColumn("RowNumber", GetType(Int32)))
source.Columns.Add(New DataColumn("Value1", GetType(Int32)))
source.Columns.Add(New DataColumn("Value2", GetType(Int32)))
source.Columns.Add(New DataColumn("Text", GetType(String)))
Dim rnd As New Random()
For i As Int32 = 1 To 100
    Dim newRow = source.NewRow
    Dim value = rnd.Next(1, 20)
    newRow("RowNumber") = i
    newRow("Value1") = value
    newRow("Value2") = (value + 1)
    newRow("Text") = String.Format("RowNumber{0}-Text", i)
    source.Rows.Add(newRow)
Next
' following query does not work, it always has Count=0 '
' although it works with only one identifier '
Dim dupIdentifiers = From row In source
         Group row By grp = New With {.Val1 = row("Value1"), .Val2 = row("Value2")}
         Into Group
         Where Group.Count > 1
         Select idGroup = New With {grp.Val1, grp.Val2, Group.Count}

编辑:以下是完整的解决方案,感谢@Jon Skeet 的回答 :)

Dim dupKeys = From row In source
        Group row By grp = New With {Key .Val1 = CInt(row("Value1")), Key .Val2 = CInt(row("Value2"))}
        Into Group Where Group.Count > 1
        Select RowNumber = CInt(Group.FirstOrDefault.Item("RowNumber"))

Dim dupRows = From row In source
        Join dupKey In dupKeys 
        On row("RowNumber") Equals dupKey 
        Select row

If dupRows.Any Then
    ' create a new DataTable from the first duplicate rows '
    Dim dest = dupRows.CopyToDataTable
End If

分组的主要问题是我必须将它们设为关键属性。 我上面的代码中的下一个问题是从原始表中获取重复的行。 由于几乎每一行都有重复项(根据两个字段),因此结果 DataTable 包含 100 行中的 99 行,而不仅仅是 19 个重复值。我只需要选择第一个重复行并将它们与 PK 上的原始表连接起来。

Select RowNumber = CInt(Group.FirstOrDefault.Item("RowNumber"))

尽管这在我的情况下有效,但也许有人可以解释我如何在我只有复合键的情况下仅从原始表中选择重复项。


编辑:我自己回答了问题的最后一部分,所以这就是我所需要的:

Dim dups = From row In source
         Group By grp = New With {Key .Value1 = CInt(row("Value1")), Key .Value2 = CInt(row("Value2"))}
         Into Group Where Group.Count > 1
         Let Text = Group.First.Item("Text")
         Select Group.First

If dups.Any Then
      Dim dest = dups.CopyToDataTable
End If

我需要 Let-Keyword 以便将其他列保持在同一上下文中并仅返回第一列分组重复的行。通过这种方式,我可以使用 CopyToDataTable 从重复的行创建 DataTable。

总体而言,只需几行代码(我可以保存第二个查询来查找原始表中的行)即可查找多个列上的重复项并创建它们的数据表。

LINQ drives me crazy. Why does following query not return the duplicates, whereas it works with only one identifier? Where is my error?

' generate some test-data '
Dim source As New DataTable
source.Columns.Add(New DataColumn("RowNumber", GetType(Int32)))
source.Columns.Add(New DataColumn("Value1", GetType(Int32)))
source.Columns.Add(New DataColumn("Value2", GetType(Int32)))
source.Columns.Add(New DataColumn("Text", GetType(String)))
Dim rnd As New Random()
For i As Int32 = 1 To 100
    Dim newRow = source.NewRow
    Dim value = rnd.Next(1, 20)
    newRow("RowNumber") = i
    newRow("Value1") = value
    newRow("Value2") = (value + 1)
    newRow("Text") = String.Format("RowNumber{0}-Text", i)
    source.Rows.Add(newRow)
Next
' following query does not work, it always has Count=0 '
' although it works with only one identifier '
Dim dupIdentifiers = From row In source
         Group row By grp = New With {.Val1 = row("Value1"), .Val2 = row("Value2")}
         Into Group
         Where Group.Count > 1
         Select idGroup = New With {grp.Val1, grp.Val2, Group.Count}

Edit: Following is the complete solution, thanks to @Jon Skeet's answer :)

Dim dupKeys = From row In source
        Group row By grp = New With {Key .Val1 = CInt(row("Value1")), Key .Val2 = CInt(row("Value2"))}
        Into Group Where Group.Count > 1
        Select RowNumber = CInt(Group.FirstOrDefault.Item("RowNumber"))

Dim dupRows = From row In source
        Join dupKey In dupKeys 
        On row("RowNumber") Equals dupKey 
        Select row

If dupRows.Any Then
    ' create a new DataTable from the first duplicate rows '
    Dim dest = dupRows.CopyToDataTable
End If

The main problem with grouping was that i must make them key properties.
The next problem in my above code was to get the duplicate rows from the original table.
Because nearly every row has a duplicate(according to two fields), the result DataTable contained 99 of 100 rows and not only the 19 duplicate values. I needed to select only the first duplicate row and join them with the original table on the PK.

Select RowNumber = CInt(Group.FirstOrDefault.Item("RowNumber"))

Although this works in my case, maybe someone can explain me how to select only the duplicates from the original table if i would have had only composite keys.


Edit: I'v answered the last part of the question myself, so here is all i need:

Dim dups = From row In source
         Group By grp = New With {Key .Value1 = CInt(row("Value1")), Key .Value2 = CInt(row("Value2"))}
         Into Group Where Group.Count > 1
         Let Text = Group.First.Item("Text")
         Select Group.First

If dups.Any Then
      Dim dest = dups.CopyToDataTable
End If

I needed the Let-Keyword in order to keep the other column(s) into the same context and return only the first row of the grouped dups. On this way i can use CopyToDataTable to create a DataTable from the duplicate rows.

Only a few lines of code overall (i can save the second query to find the rows in the original table) to find duplicates on multiple columns and create a DataTable of them.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

∝单色的世界2024-12-13 05:05:26

问题在于 匿名类型在 VB 中的工作方式 - 它们是可变的默认情况下;仅包含 Key 属性用于散列和相等。试试这个:(

Group row By grp = New With {Key .Val1 = row("Value1"), Key .Val2 = row("Value2")}

在 C# 中这不会成为问题 - C# 中的匿名类型在所有属性中始终是不可变的。)

The problem is the way anonymous types work in VB - they're mutable by default; only Key properties are included for hashing and equality. Try this:

Group row By grp = New With {Key .Val1 = row("Value1"), Key .Val2 = row("Value2")}

(In C# this wouldn't be a problem - anonymous types in C# are always immutable in all properties.)

英雄似剑2024-12-13 05:05:26

我使用 Lin-q 和 C Sharp 来获取 EF 表中两列中的重复行以显示为重复项:

            var DuplicatesFoundInTable =
            entities.LocationDatas
           .Where(c => c.TrailerNumber != null && c.CarrierName != null && (c.TrailerNumber ?? string.Empty) != string.Empty && (c.CarrierName ?? string.Empty) != string.Empty)
           .GroupBy(o => new { o.TrailerNumber, o.CarrierName }, l => new { customer.TrailerNumber, customer.CarrierName })
           .Where(g => g.Count() > 1)
           .Select(y => y.Key)
           .ToList();

当我想查看输入上是否重复时(如果条目已存在于两列中):

          //Check to see if any rows are the same values on TrailerNumber and CarrierName for inputs. 
            bool AlreadyInTableComparer = entities.LocationDatas.Any(l => String.Compare(l.TrailerNumber, customer.TrailerNumber, StringComparison.InvariantCulture) == 0 && String.Compare(l.CarrierName, customer.CarrierName, StringComparison.InvariantCulture) == 0);
            bool AlreadyInTable = entities.LocationDatas.Any(t => t.TrailerNumber == customer.TrailerNumber && t.CarrierName == customer.CarrierName);

SQL Server 检查重复项(注释掉并删除重复项):

 WITH CTE
 AS
  (
 SELECT [TrailerNumber], [CarrierName]
 ,ROW_NUMBER() OVER(Partition BY TrailerNumber Order by TrailerNumber, 
    CarrierName) AS NumRows, ROW_NUMBER() OVER(Partition BY TrailerNumber, 
   CarrierName Order by CarrierName) AS NumRows2
   FROM [dbo].[LocationData] --Please note, duplicates are shown in this 
   table.
   WHERE  TrailerNumber != '' AND CarrierName != '' 
   )
   SELECT [TrailerNumber], [CarrierName], [NumRows2] FROM CTE WHERE NumRows2 > 1
   --DELETE FROM CTE WHERE NumRows2 > 1  --Delete Duplicates.

验证 SQL Server 以证明 CTE 过滤正确:

    SELECT TrailerNumber, CarrierName, COUNT(*) AS Duplicates
    FROM [dbo].[LocationData]
    WHERE TrailerNumber IS NOT NULL OR CarrierName IS NOT NULL 
    GROUP BY TrailerNumber, CarrierName
    HAVING COUNT(*) >1 AND TrailerNumber != '' AND CarrierName != ''    

What I use to get duplicate rows across two columns in a EF table to show up as duplicates using Lin-q with C Sharp:

            var DuplicatesFoundInTable =
            entities.LocationDatas
           .Where(c => c.TrailerNumber != null && c.CarrierName != null && (c.TrailerNumber ?? string.Empty) != string.Empty && (c.CarrierName ?? string.Empty) != string.Empty)
           .GroupBy(o => new { o.TrailerNumber, o.CarrierName }, l => new { customer.TrailerNumber, customer.CarrierName })
           .Where(g => g.Count() > 1)
           .Select(y => y.Key)
           .ToList();

When I want to see if it is a duplicate on inputs (if the entry already exists in two columns):

          //Check to see if any rows are the same values on TrailerNumber and CarrierName for inputs. 
            bool AlreadyInTableComparer = entities.LocationDatas.Any(l => String.Compare(l.TrailerNumber, customer.TrailerNumber, StringComparison.InvariantCulture) == 0 && String.Compare(l.CarrierName, customer.CarrierName, StringComparison.InvariantCulture) == 0);
            bool AlreadyInTable = entities.LocationDatas.Any(t => t.TrailerNumber == customer.TrailerNumber && t.CarrierName == customer.CarrierName);

SQL Server Checking for duplicates (commented out delete duplicates):

 WITH CTE
 AS
  (
 SELECT [TrailerNumber], [CarrierName]
 ,ROW_NUMBER() OVER(Partition BY TrailerNumber Order by TrailerNumber, 
    CarrierName) AS NumRows, ROW_NUMBER() OVER(Partition BY TrailerNumber, 
   CarrierName Order by CarrierName) AS NumRows2
   FROM [dbo].[LocationData] --Please note, duplicates are shown in this 
   table.
   WHERE  TrailerNumber != '' AND CarrierName != '' 
   )
   SELECT [TrailerNumber], [CarrierName], [NumRows2] FROM CTE WHERE NumRows2 > 1
   --DELETE FROM CTE WHERE NumRows2 > 1  --Delete Duplicates.

Validate SQL Server to prove correct against CTE filtering:

    SELECT TrailerNumber, CarrierName, COUNT(*) AS Duplicates
    FROM [dbo].[LocationData]
    WHERE TrailerNumber IS NOT NULL OR CarrierName IS NOT NULL 
    GROUP BY TrailerNumber, CarrierName
    HAVING COUNT(*) >1 AND TrailerNumber != '' AND CarrierName != ''    
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文