使用 LINQ 从一个表中按字段删除重复项
我必须在 DataTable
中保留当前数据库中不存在日期的记录。
因此,我使用存储过程读取所有现有日期(正确吗?):
SELECT DISTINCT CAST(S.[date] AS DATE) -- original date is DATETIME2(0)
FROM ...
WHERE ...
并将其加载到 DataTable:
var tableDate = new DataTable();
new SqlDataAdapter(command).Fill(tableDate);
如何立即从另一个表中删除所有不必要的行?我认为 LINQ
可以提供帮助,但我不确定如何......
I have to leave in a DataTable
only records with dates currently not present in the database.
So I read all existing dates using the stored procedure (is it correct?):
SELECT DISTINCT CAST(S.[date] AS DATE) -- original date is DATETIME2(0)
FROM ...
WHERE ...
and load it to a DataTable
:
var tableDate = new DataTable();
new SqlDataAdapter(command).Fill(tableDate);
How to remove now from another table all unnecessary rows? I think LINQ
could help but I'm not sure how..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我正在查看您的答案,您说它有效,而您只是想知道如何在“单个 LINQ 查询”中执行此操作。请记住,这些查询都具有延迟执行,因此以下两个查询在功能上是等效的:
并且:
第二个版本更难阅读,但尽管如此,它是“一个查询”。
话虽如此,这些示例似乎都不符合您的问题标题,这表明您正在尝试删除重复行。如果这确实是您想要做的,这里有一个方法可以做到这一点:
如果您不关心删除了哪些重复项,那么您只需删除
OrderBy
线。您可以按如下方式进行测试:(只需在本示例的
RemoveDuplicates
方法中将“date”替换为“Code”)希望其中之一能回答您的问题。否则我认为你必须更清楚你的要求。
I'm looking at your answer, which you say works, and you just want to know how to do it in a "single LINQ query." Keep in mind that these queries all have deferred execution, so the following two queries are functionally equivalent:
And:
The second version is a lot harder to read, but nevertheless, it is "one query."
Having said this, none of these examples really seem to match your question title, which suggests that you are trying to remove duplicate rows. If that is indeed what you are trying to do, here is a method that will do that:
If you don't care about which duplicates removed then you can just remove the
OrderBy
line. You can test this as follows:(just replace "date" with "Code" in the
RemoveDuplicates
method for this example)Hopefully one of these answers your question. Otherwise I think you're going to have to be more clear with your requirements.
您可以使用
Except()
return reports.Except(dates);
UPDATED:
如果您的
DataTable
具有键入字段,则应如下所示:var excepted = arbDates.Rows.OfType().Select(a => a [0])().Select(e => e[0]));
.Except(excDates.Rows.OfType
否则你可以转换它:
var excepted = arbDates.Rows .OfType()()
.Select(a => Convert.ToDateTime(a[0].ToString()))
。除了(
excDates.Rows.OfType
.Select(e => Convert.ToDateTime(e[0].ToString())));
You could use
Except()
return records.Except(dates);
UPDATED:
If your
DataTable
has typed fields, then it should be like the following:var excluded = arbDates.Rows.OfType<System.Data.DataRow>().Select(a => a[0])
.Except(excDates.Rows.OfType<System.Data.DataRow>().Select(e => e[0]));
otherwise you could cast it:
var excluded = arbDates.Rows.OfType<System.Data.DataRow>()
.Select(a => Convert.ToDateTime(a[0].ToString()))
.Except(
excDates.Rows.OfType<System.Data.DataRow>()
.Select(e => Convert.ToDateTime(e[0].ToString())));
你的 SQL 语句看起来不错。据我了解,您正在转换以获取从午夜开始的默认时间值。因此,要比较的其他表中的日期也必须与该格式匹配,以便将日期与中性时间进行比较。如果不是,您仍然可以使用下面的代码,但必须在引用
tableResult
行字段的任何位置添加.Date
属性。我还使用了Field(0)
但根据您的查询并根据您之前的示例,您可能需要使用Field("date")
。不需要自定义比较器。要将 LINQ 查询合并为单个查询,您只需使用
let
关键字并通过查询携带中间结果并引用它。尝试一下:
在点表示法中,查询将是:
您可以使用
tableResult.Rows.Cast()
或代替()
。所有这些方法的结果都是相同的。tableResult.AsEnumerable()
>tableResult.Rows.OfType如果要从现有表中删除重复项(而不是将其复制到新表),可以删除 表中的相交方法:
Your SQL statement looks fine. As I understand it, you're casting to get the default time value starting at midnight. Therefore the dates in the other table being compared must also match that format in order to compare the dates with neutral times. If they aren't you can still use the code I have below but you must add the
.Date
property anywhere that thetableResult
row's field is referenced. Also I have usedField<DateTime>(0)
but depending on your query and based on your earlier example you may need to useField<DateTime>("date")
.There's no need for a custom comparer. To merge your LINQ queries into a single query you could simply use the
let
keyword and carry the intermediate result through the query and reference it.Give this a try:
In dot notation the query would be:
Instead of
tableResult.AsEnumerable()
you could usetableResult.Rows.Cast<DataRow>()
ortableResult.Rows.OfType<DataRow>()
. The results are the same between all these approaches.If you want to remove duplicates from the existing table (rather than copy it to a new table), you could remove the items returned by the Intersect method from the table:
据我了解这个问题,您正在尝试对来自某些导入的数据进行重复数据删除。您可能不需要使用 LINQ 来执行此操作。尽管帖子标题建议使用 LINQ,但您稍后会质疑 LINQ 是否可能是最佳解决方案,并且根据我们所知,我认为您可以使用单个 Insert 语句来完成此操作。
首先,我建议将数据批量复制到数据库中的临时位置(如果您还没有这样做),如下所示:
批量复制到临时位置的优点之一是您可以添加索引等以加快速度加强清洁过程。要对数据进行重复数据删除,您可以运行如下查询:
或者
这将提取它找到的第一个日期(ID 最低的日期)。这显然有点随意,但为了更加完善,我们需要更多地了解数据结构和要求。
As I understand the problem, you are trying to de-dup data coming from some import. You may not need to do this using LINQ. Although the post title suggests LINQ, you later question whether LINQ might be the best solution and, given what we know, I think you could do this using a single Insert statement.
First, I'd suggest bulk copying the data into a temporary location in the db (if you are not already doing this) like so:
One of the advantages of bulk copying into a temporary location is that you can add indexes and such to speed up the cleaning process. To de-dup the data, you could then run a query like so:
Or
This will pull the first date it finds (the one with the lowest Id). This is obviously somewhat arbitrary but to get more refined we'd need to know more about the data structure and requirements.