使用 LINQ 从一个表中按字段删除重复项

发布于 2024-08-20 23:44:27 字数 405 浏览 6 评论 0原文

我必须在 DataTable 中保留当前数据库中不存在日期的记录。

因此,我使用存储过程读取所有现有日期(正确吗?):

SELECT DISTINCT CAST(S.[date] AS DATE) -- original date is DATETIME2(0)
FROM ...
WHERE ...

并将其加载到 DataTable:

var tableDate = new DataTable();
new SqlDataAdapter(command).Fill(tableDate);

如何立即从另一个表中删除所有不必要的行?我认为 LINQ 可以提供帮助,但我不确定如何......

I have to leave in a DataTable only records with dates currently not present in the database.

So I read all existing dates using the stored procedure (is it correct?):

SELECT DISTINCT CAST(S.[date] AS DATE) -- original date is DATETIME2(0)
FROM ...
WHERE ...

and load it to a DataTable:

var tableDate = new DataTable();
new SqlDataAdapter(command).Fill(tableDate);

How to remove now from another table all unnecessary rows? I think LINQ could help but I'm not sure how..

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

想你只要分分秒秒 2024-08-27 23:44:27

我正在查看您的答案,您说它有效,而您只是想知道如何在“单个 LINQ 查询”中执行此操作。请记住,这些查询都具有延迟执行,因此以下两个查询在功能上是等效的:

var q =
    from d in dates
    select d.Field<DateTime>("date");
return
    (from r in records
     where !q.Contains(r.Field<DateTime>("date"))
     select r).CopyToDataTable();

并且:

return
    (from r in records
     where !dates
         .Select(d => d.Field<DateTime>("date"))
         .Contains(r.Field<DateTime>("date"))
     select r).CopyToDataTable();

第二个版本更难阅读,但尽管如此,它是“一个查询”。


话虽如此,这些示例似乎都不符合您的问题标题,这表明您正在尝试删除重复行。如果这确实是您想要做的,这里有一个方法可以做到这一点:

static DataTable RemoveDuplicates(DataTable dt)
{
    return
        (from row in dt.Rows.OfType<DataRow>()
         group row by row.Field<string>("date") into g
         select g
            .OrderBy(r => r.Field<int>("ID"))
            .First()).CopyToDataTable();
}

如果您不关心删除了哪些重复项,那么您只需删除OrderBy 线。您可以按如下方式进行测试:(

static void Main(string[] args)
{
    using (DataTable original = CreateSampleTable())
    using (DataTable filtered = RemoveDuplicates(original))
    {
        DumpTable(filtered);
    }
    Console.ReadKey();
}

static DataTable CreateSampleTable()
{
    DataTable dt = new DataTable();
    dt.Columns.Add("ID", typeof(int));
    dt.Columns.Add("Code", typeof(string));
    dt.Columns.Add("Name", typeof(string));
    dt.Rows.Add(1, "123", "Alice");
    dt.Rows.Add(2, "456", "Bob");
    dt.Rows.Add(3, "456", "Chris");
    dt.Rows.Add(4, "789", "Dave");
    dt.Rows.Add(5, "123", "Elen");
    dt.Rows.Add(6, "123", "Frank");
    return dt;
}

static void DumpTable(DataTable dt)
{
    foreach (DataRow row in dt.Rows)
    {
        Console.WriteLine("{0},{1},{2}",
            row.Field<int>("ID"),
            row.Field<string>("Code"),
            row.Field<string>("Name"));
    }
}

只需在本示例的 RemoveDuplicates 方法中将“date”替换为“Code”)

希望其中之一能回答您的问题。否则我认为你必须更清楚你的要求。

I'm looking at your answer, which you say works, and you just want to know how to do it in a "single LINQ query." Keep in mind that these queries all have deferred execution, so the following two queries are functionally equivalent:

var q =
    from d in dates
    select d.Field<DateTime>("date");
return
    (from r in records
     where !q.Contains(r.Field<DateTime>("date"))
     select r).CopyToDataTable();

And:

return
    (from r in records
     where !dates
         .Select(d => d.Field<DateTime>("date"))
         .Contains(r.Field<DateTime>("date"))
     select r).CopyToDataTable();

The second version is a lot harder to read, but nevertheless, it is "one query."


Having said this, none of these examples really seem to match your question title, which suggests that you are trying to remove duplicate rows. If that is indeed what you are trying to do, here is a method that will do that:

static DataTable RemoveDuplicates(DataTable dt)
{
    return
        (from row in dt.Rows.OfType<DataRow>()
         group row by row.Field<string>("date") into g
         select g
            .OrderBy(r => r.Field<int>("ID"))
            .First()).CopyToDataTable();
}

If you don't care about which duplicates removed then you can just remove the OrderBy line. You can test this as follows:

static void Main(string[] args)
{
    using (DataTable original = CreateSampleTable())
    using (DataTable filtered = RemoveDuplicates(original))
    {
        DumpTable(filtered);
    }
    Console.ReadKey();
}

static DataTable CreateSampleTable()
{
    DataTable dt = new DataTable();
    dt.Columns.Add("ID", typeof(int));
    dt.Columns.Add("Code", typeof(string));
    dt.Columns.Add("Name", typeof(string));
    dt.Rows.Add(1, "123", "Alice");
    dt.Rows.Add(2, "456", "Bob");
    dt.Rows.Add(3, "456", "Chris");
    dt.Rows.Add(4, "789", "Dave");
    dt.Rows.Add(5, "123", "Elen");
    dt.Rows.Add(6, "123", "Frank");
    return dt;
}

static void DumpTable(DataTable dt)
{
    foreach (DataRow row in dt.Rows)
    {
        Console.WriteLine("{0},{1},{2}",
            row.Field<int>("ID"),
            row.Field<string>("Code"),
            row.Field<string>("Name"));
    }
}

(just replace "date" with "Code" in the RemoveDuplicates method for this example)

Hopefully one of these answers your question. Otherwise I think you're going to have to be more clear with your requirements.

挽容 2024-08-27 23:44:27

您可以使用 Except()

return reports.Except(dates);

UPDATED:
如果您的 DataTable 具有键入字段,则应如下所示:

var excepted = arbDates.Rows.OfType().Select(a => a [0])
.Except(excDates.Rows.OfType().Select(e => e[0]));

否则你可以转换它:

var excepted = arbDates.Rows .OfType()
.Select(a => Convert.ToDateTime(a[0].ToString()))
。除了(
excDates.Rows.OfType()
.Select(e => Convert.ToDateTime(e[0].ToString())));

You could use Except()

return records.Except(dates);

UPDATED:
If your DataTable has typed fields, then it should be like the following:

var excluded = arbDates.Rows.OfType<System.Data.DataRow>().Select(a => a[0])
.Except(excDates.Rows.OfType<System.Data.DataRow>().Select(e => e[0]));

otherwise you could cast it:

var excluded = arbDates.Rows.OfType<System.Data.DataRow>()
.Select(a => Convert.ToDateTime(a[0].ToString()))
.Except(
excDates.Rows.OfType<System.Data.DataRow>()
.Select(e => Convert.ToDateTime(e[0].ToString())));

飘落散花 2024-08-27 23:44:27

你的 SQL 语句看起来不错。据我了解,您正在转换以获取从午夜开始的默认时间值。因此,要比较的其他表中的日期也必须与该格式匹配,以便将日期与中性时间进行比较。如果不是,您仍然可以使用下面的代码,但必须在引用 tableResult 行字段的任何位置添加 .Date 属性。我还使用了 Field(0) 但根据您的查询并根据您之前的示例,您可能需要使用 Field("date")

不需要自定义比较器。要将 LINQ 查询合并为单个查询,您只需使用 let 关键字并通过查询携带中间结果并引用它。

尝试一下:

var tableDate = new DataTable();
new SqlDataAdapter(command).Fill(tableDate);

// this is the other table that has other dates, so populate as needed
var tableResult = new DataTable();

var newTable =
    (from row in tableResult.AsEnumerable()
    let uniqueRows = tableResult.AsEnumerable().Select(r => r.Field<DateTime>(0))
                                .Except(tableDate.AsEnumerable().Select(r => r.Field<DateTime>(0)))
    where uniqueRows.Contains(row.Field<DateTime>(0))
    select row).CopyToDataTable();

在点表示法中,查询将是:

var newTable = tableResult.AsEnumerable()
    .Select(row => new
    {
        Row = row,
        UniqueRows =  tableResult.AsEnumerable()
                                 .Select(r => r.Field<DateTime>(0))
                                 .Except(tableDate.AsEnumerable().Select(r => r.Field<DateTime>(0)))
    })
    .Where(item => item.UniqueRows.Contains(item.Row.Field<DateTime>(0)))
    .Select(item => item.Row)
    .CopyToDataTable();

您可以使用 tableResult.Rows.Cast()代替 tableResult.AsEnumerable() >tableResult.Rows.OfType()。所有这些方法的结果都是相同的。

如果要从现有表中删除重复项(而不是将其复制到新表),可以删除 表中的相交方法

var commonDates = tableDate.AsEnumerable().Select(row => row.Field<DateTime>(0))
                           .Intersect(tableResult.AsEnumerable().Select(row => row.Field<DateTime>(0)));

for (int index = tableResult.Rows.Count - 1; index >= 0; index--)
{
    if (commonDates.Contains(tableResult.Rows[index].Field<DateTime>(0)))
    {
        tableResult.Rows.RemoveAt(index);
    }
}

Your SQL statement looks fine. As I understand it, you're casting to get the default time value starting at midnight. Therefore the dates in the other table being compared must also match that format in order to compare the dates with neutral times. If they aren't you can still use the code I have below but you must add the .Date property anywhere that the tableResult row's field is referenced. Also I have used Field<DateTime>(0) but depending on your query and based on your earlier example you may need to use Field<DateTime>("date").

There's no need for a custom comparer. To merge your LINQ queries into a single query you could simply use the let keyword and carry the intermediate result through the query and reference it.

Give this a try:

var tableDate = new DataTable();
new SqlDataAdapter(command).Fill(tableDate);

// this is the other table that has other dates, so populate as needed
var tableResult = new DataTable();

var newTable =
    (from row in tableResult.AsEnumerable()
    let uniqueRows = tableResult.AsEnumerable().Select(r => r.Field<DateTime>(0))
                                .Except(tableDate.AsEnumerable().Select(r => r.Field<DateTime>(0)))
    where uniqueRows.Contains(row.Field<DateTime>(0))
    select row).CopyToDataTable();

In dot notation the query would be:

var newTable = tableResult.AsEnumerable()
    .Select(row => new
    {
        Row = row,
        UniqueRows =  tableResult.AsEnumerable()
                                 .Select(r => r.Field<DateTime>(0))
                                 .Except(tableDate.AsEnumerable().Select(r => r.Field<DateTime>(0)))
    })
    .Where(item => item.UniqueRows.Contains(item.Row.Field<DateTime>(0)))
    .Select(item => item.Row)
    .CopyToDataTable();

Instead of tableResult.AsEnumerable() you could use tableResult.Rows.Cast<DataRow>() or tableResult.Rows.OfType<DataRow>(). The results are the same between all these approaches.

If you want to remove duplicates from the existing table (rather than copy it to a new table), you could remove the items returned by the Intersect method from the table:

var commonDates = tableDate.AsEnumerable().Select(row => row.Field<DateTime>(0))
                           .Intersect(tableResult.AsEnumerable().Select(row => row.Field<DateTime>(0)));

for (int index = tableResult.Rows.Count - 1; index >= 0; index--)
{
    if (commonDates.Contains(tableResult.Rows[index].Field<DateTime>(0)))
    {
        tableResult.Rows.RemoveAt(index);
    }
}
凉栀 2024-08-27 23:44:27

据我了解这个问题,您正在尝试对来自某些导入的数据进行重复数据删除。您可能不需要使用 LINQ 来执行此操作。尽管帖子标题建议使用 LINQ,但您稍后会质疑 LINQ 是否可能是最佳解决方案,并且根据我们所知,我认为您可以使用单个 Insert 语句来完成此操作。

首先,我建议将数据批量复制到数据库中的临时位置(如果您还没有这样做),如下所示:

Create Table TempBulkCopyData
(
    Id int not null identity(1,1)
    , Date DateTime2 not null
    , ...
)

批量复制到临时位置的优点之一是您可以添加索引等以加快速度加强清洁过程。要对数据进行重复数据删除,您可以运行如下查询:

Insert DestinationData(...)
Select ...
From BulkCopyData As BCD
Where Id = (
            Select Min(BCD2.[Id])
            From BulkCopyData As BCD2
            Where Cast(BCD2.[Date] As Date) = Cast(BCD.[Date] As Date)
            )

或者

Insert DestinationData(...)
Select ...
From BulkCopyData As BCD
Where Id = (
            Select Min(BCD2.[Id])
            From BulkCopyData As BCD2
            Where DateDiff(d, BCD.[Date], BCD2.[Date]) = 0
            )

这将提取它找到的第一个日期(ID 最低的日期)。这显然有点随意,但为了更加完善,我们需要更多地了解数据结构和要求。

As I understand the problem, you are trying to de-dup data coming from some import. You may not need to do this using LINQ. Although the post title suggests LINQ, you later question whether LINQ might be the best solution and, given what we know, I think you could do this using a single Insert statement.

First, I'd suggest bulk copying the data into a temporary location in the db (if you are not already doing this) like so:

Create Table TempBulkCopyData
(
    Id int not null identity(1,1)
    , Date DateTime2 not null
    , ...
)

One of the advantages of bulk copying into a temporary location is that you can add indexes and such to speed up the cleaning process. To de-dup the data, you could then run a query like so:

Insert DestinationData(...)
Select ...
From BulkCopyData As BCD
Where Id = (
            Select Min(BCD2.[Id])
            From BulkCopyData As BCD2
            Where Cast(BCD2.[Date] As Date) = Cast(BCD.[Date] As Date)
            )

Or

Insert DestinationData(...)
Select ...
From BulkCopyData As BCD
Where Id = (
            Select Min(BCD2.[Id])
            From BulkCopyData As BCD2
            Where DateDiff(d, BCD.[Date], BCD2.[Date]) = 0
            )

This will pull the first date it finds (the one with the lowest Id). This is obviously somewhat arbitrary but to get more refined we'd need to know more about the data structure and requirements.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文