从结构数组中删除重复项

发布于 2024-11-29 07:14:33 字数 1593 浏览 0 评论 0原文

我无法弄清楚从结构数组中删除重复条目

我有这个结构:

public struct stAppInfo
{
    public string sTitle;
    public string sRelativePath;
    public string sCmdLine;
    public bool bFindInstalled;
    public string sFindTitle;
    public string sFindVersion;
    public bool bChecked;
}

我已将 stAppInfo 结构更改为类 here 感谢 Jon Skeet

代码是这样的:(简短版本)

stAppInfo[] appInfo = new stAppInfo[listView1.Items.Count];

int i = 0;
foreach (ListViewItem item in listView1.Items)
{
    appInfo[i].sTitle = item.Text;
    appInfo[i].sRelativePath = item.SubItems[1].Text;
    appInfo[i].sCmdLine = item.SubItems[2].Text;
    appInfo[i].bFindInstalled = (item.SubItems[3].Text.Equals("Sí")) ? true : false;
    appInfo[i].sFindTitle = item.SubItems[4].Text;
    appInfo[i].sFindVersion = item.SubItems[5].Text;
    appInfo[i].bChecked = (item.SubItems[6].Text.Equals("Sí")) ? true : false;
    i++;
}

我需要 appInfo 数组在 sTitle 中是唯一的并且sRelativePath 成员其他成员可以重复

编辑:

感谢所有人的回答,但此应用程序是“可移植的”我的意思是我只需要 .exe 文件,而我不需要不想添加其他文件,例如引用 *.dll,所以请不要使用外部引用此应用程序旨在在随身碟中使用

所有数据都来自 *.ini 文件,我所做的是:(伪代码)

ReadFile()
FillDataFromFileInAppInfoArray()
DeleteDuplicates()
FillListViewControl()

当我想将该数据保存到一个文件我有以下选项:

  1. 使用ListView 数据
  2. 使用 appInfo 数组(这更快??)
  3. 还有其他??吗?

EDIT2:

非常感谢:Jon Skeet、Michael Hays,感谢你们抽出时间!

I can't figured out in remove duplicates entries from an Array of struct

I have this struct:

public struct stAppInfo
{
    public string sTitle;
    public string sRelativePath;
    public string sCmdLine;
    public bool bFindInstalled;
    public string sFindTitle;
    public string sFindVersion;
    public bool bChecked;
}

I have changed the stAppInfo struct to class here thanks to Jon Skeet

The code is like this: (short version)

stAppInfo[] appInfo = new stAppInfo[listView1.Items.Count];

int i = 0;
foreach (ListViewItem item in listView1.Items)
{
    appInfo[i].sTitle = item.Text;
    appInfo[i].sRelativePath = item.SubItems[1].Text;
    appInfo[i].sCmdLine = item.SubItems[2].Text;
    appInfo[i].bFindInstalled = (item.SubItems[3].Text.Equals("Sí")) ? true : false;
    appInfo[i].sFindTitle = item.SubItems[4].Text;
    appInfo[i].sFindVersion = item.SubItems[5].Text;
    appInfo[i].bChecked = (item.SubItems[6].Text.Equals("Sí")) ? true : false;
    i++;
}

I need that appInfo array be unique in sTitle and sRelativePath members the others members can be duplicates

EDIT:

Thanks to all for the answers but this application is "portable" I mean I just need the .exe file and I don't want to add another files like references *.dll so please no external references this app is intended to use in a pendrive

All data comes form a *.ini file what I do is: (pseudocode)

ReadFile()
FillDataFromFileInAppInfoArray()
DeleteDuplicates()
FillListViewControl()

When I want to save that data into a file I have these options:

  1. Using ListView data
  2. Using appInfo array (this is more faster¿?)
  3. Any other¿?

EDIT2:

Big thanks to: Jon Skeet, Michael Hays thanks for your time guys!!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

仅此而已 2024-12-06 07:14:33

首先,不要使用可变结构。从各个方面来说,它们都是一个坏主意。

其次,不要使用公共字段。字段应该是一个实现细节 - 使用属性。

第三,我根本不清楚这应该是一个结构。它看起来相当大,而且不是特别“单一的值”。

第四,请遵循 .NET 命名约定,以便您的代码符合所有其余代码都是用 .NET 编写的。

第五,您无法从数组中删除项目,因为数组是使用固定大小创建的......但您可以创建一个仅包含唯一元素的新数组。

LINQ to Objects 可以让您使用 来完成此操作GroupBy 如 Albin 所示,但(在我看来)稍微简洁的方法是使用 DistinctBy 来自 MoreLINQ

var unique = appInfo.DistinctBy(x => new { x.sTitle, x.sRelativePath })
                    .ToArray();

这通常比 GroupBy 更高效,在我看来也更优雅。

就我个人而言,我通常更喜欢使用 List 而不是数组,但上面将为您创建一个数组。

请注意,使用此代码仍然可以有两个具有相同标题的项目,并且仍然可以有两个具有相同相对路径的项目 - 只是不能有两个具有相同相对路径的项目标题。如果存在重复项,DistinctBy 将始终从输入序列中生成第一个此类项。

编辑:只是为了满足迈克尔,你实际上不需要创建一个数组来开始,或者如果你不需要它之后创建一个数组:

var query = listView1.Items
                     .Cast<ListViewItem>()
                     .Select(item => new stAppInfo
                             {
                                 sTitle = item.Text,
                                 sRelativePath = item.SubItems[1].Text,
                                 bFindInstalled = item.SubItems[3].Text == "Sí",
                                 sFindTitle = item.SubItems[4].Text,
                                 sFindVersion = item.SubItems[5].Text,
                                 bChecked = item.SubItems[6].Text == "Sí"
                             })
                     .DistinctBy(x => new { x.sTitle, x.sRelativePath });

这将为你提供一个 IEnumerable< ;appInfo> 是延迟流式传输的。但请注意,如果您对其进行多次迭代,它将会对 listView1.Items 进行相同次数的迭代,每次都执行相同的唯一性比较。

与 Michael 的方法相比,我更喜欢这种方法,因为它使“区别于”列的语义非常清晰,并且消除了用于从 ListViewItem 中提取这些列的代码的重复。是的,它涉及构建更多对象,但我更喜欢清晰度而不是效率,直到基准测试证明实际上需要更高效的代码。

Firstly, please don't use mutable structs. They're a bad idea in all kinds of ways.

Secondly, please don't use public fields. Fields should be an implementation detail - use properties.

Thirdly, it's not at all clear to me that this should be a struct. It looks rather large, and not particularly "a single value".

Fourthly, please follow the .NET naming conventions so your code fits in with all the rest of the code written in .NET.

Fifthly, you can't remove items from an array, as arrays are created with a fixed size... but you can create a new array with only unique elements.

LINQ to Objects will let you do that already using GroupBy as shown by Albin, but a slightly neater (in my view) approach is to use DistinctBy from MoreLINQ:

var unique = appInfo.DistinctBy(x => new { x.sTitle, x.sRelativePath })
                    .ToArray();

This is generally more efficient than GroupBy, and also more elegant in my view.

Personally I generally prefer using List<T> over arrays, but the above will create an array for you.

Note that with this code there can still be two items with the same title, and there can still be two items with the same relative path - there just can't be two items with the same relative path and title. If there are duplicate items, DistinctBy will always yield the first such item from the input sequence.

EDIT: Just to satisfy Michael, you don't actually need to create an array to start with, or create an array afterwards if you don't need it:

var query = listView1.Items
                     .Cast<ListViewItem>()
                     .Select(item => new stAppInfo
                             {
                                 sTitle = item.Text,
                                 sRelativePath = item.SubItems[1].Text,
                                 bFindInstalled = item.SubItems[3].Text == "Sí",
                                 sFindTitle = item.SubItems[4].Text,
                                 sFindVersion = item.SubItems[5].Text,
                                 bChecked = item.SubItems[6].Text == "Sí"
                             })
                     .DistinctBy(x => new { x.sTitle, x.sRelativePath });

That will give you an IEnumerable<appInfo> which is lazily streamed. Note that if you iterate over it more than once, however, it will iterate over listView1.Items the same number of times, performing the same uniqueness comparisons each time.

I prefer this approach over Michael's as it makes the "distinct by" columns very clear in semantic meaning, and removes the repetition of the code used to extract those columns from a ListViewItem. Yes, it involves building more objects, but I prefer clarity over efficiency until benchmarking has proved that the more efficient code is actually required.

灵芸 2024-12-06 07:14:33

你需要的是一套。它确保输入的项目是唯一的(基于您将设置的某些限定符)。其操作方法如下:

  • 首先,将结构更改为类。确实无法解决这个问题。

  • 其次,提供 IEqualityComparer 的实现。这可能很麻烦,但它是让你的设置工作的东西(我们稍后会看到):

    公共类AppInfoComparer:IEqualityComparer
    {
        公共 bool 等于(stAppInfo x,stAppInfo y){
            if (ReferenceEquals(x, y)) 返回 true;
            if (x == null || y == null) 返回 false;
            返回等于(x.sTitle,y.sTitle)&&等于(x.sRelativePath,
               y.s相对路径);
        }
    
        // 这部分很痛苦,但这部分已经写好了 
        //专门针对你的问题。
        公共 int GetHashCode(stAppInfo obj) {
            未选中{
                返回 ((obj.sTitle != null 
                    ? obj.sTitle.GetHashCode() : 0) * 397)
                    ^ (obj.sRelativePath != null 
                    ? obj.sRelativePath.GetHashCode() : 0);
            }
        }
    }
    
  • 然后,当需要设置你的设置时,执行以下操作:

    var appInfoSet = new HashSet(new AppInfoComparer());
    foreach(listView1.Items 中的 ListViewItem 项)
    {
        var newItem = 新 stAppInfo { 
            sTitle = 项目.Text,
            sRelativePath = item.SubItems[1].Text,
            sCmdLine = item.SubItems[2].Text,
            bFindInstalled = (item.SubItems[3].Text.Equals("Sí")) ?真:假,
            sFindTitle = item.SubItems[4].Text,
            sFindVersion = item.SubItems[5].Text,
            bChecked = (item.SubItems[6].Text.Equals("Sí")) ?真:假};
        appInfoSet.Add(newItem);
    }
    
  • appInfoSet 现在包含具有唯一标题的 stAppInfo 对象的集合/路径组合,根据您的要求。如果您必须有一个数组,请执行以下操作:

    stAppInfo[] appInfo = appInfoSet.ToArray();
    

注意:我选择此实现是因为它看起来像您已经在做事情的方式。它有一个易于阅读的 for 循环(尽管我不需要计数器变量)。它不涉及 LINQ(如果您不熟悉它可能会很麻烦)。除了 .NET 框架为您提供的功能之外,它不需要任何外部库。最后,它提供了一个正如您所要求的数组。至于从 INI 文件中读取文件,希望您看到唯一会改变的是您的 foreach 循环。

更新

哈希码可能会很痛苦。您可能想知道为什么需要计算它们。毕竟,您不能在每次插入后比较标题和相对路径的值吗?当然,您当然可以,这正是另一个名为 SortedSet 的集合的工作原理。 SortedSet 让您以与上面实现 IEqualityComparer 相同的方式实现 IComparer

因此,在这种情况下,AppInfoComparer 看起来像这样:

private class AppInfoComparer : IComparer<stAppInfo>
{
   // return -1 if x < y, 1 if x > y, or 0 if they are equal
   public int Compare(stAppInfo x, stAppInfo y)
   {
      var comparison = x.sTitle.CompareTo(y.sTitle);
      if (comparison != 0) return comparison;
      return x.sRelativePath.CompareTo(y.sRelativePath);
   }
}

然后您需要做的唯一其他更改是使用 SortedSet 而不是 HashSet:

var appInfoSet = new SortedSet<stAppInfo>(new AppInfoComparer());

事实上,它更容易,您可能想知道什么给?大多数人选择 HashSet 而不是 SortedSet 的原因是性能。但是您应该平衡这一点与您实际关心的程度,因为您将维护该代码。我个人使用一个名为 Resharper 的工具,它可用于 Visual Studio,它为我计算这些哈希函数,因为我认为计算它们也很痛苦。

(我将讨论这两种方法的复杂性,但如果您已经了解它,或者不感兴趣,请随意跳过它。)

SortedSet 具有复杂性O(log n),也就是说,每次输入一个新项目时,都会有效地走到集合的中间点并进行比较。如果它找不到您的条目,它将转到最后一次猜测与该猜测左侧或右侧的组之间的中间点,快速减少元素隐藏的位置。对于 100 万个条目,这大约需要 20 次尝试。一点也不差。但是,如果您选择了一个好的哈希函数,那么平均而言,HashSet 在一次比较中可以完成相同的工作,即 O(1)。在你认为 20 与 1 相比真的没什么大不了的时候(毕竟计算机都很快),请记住你必须插入那百万个项目,所以虽然 HashSet 花费了大约一百万次尝试来构建该设置,SortedSet 花费了几百万次尝试。但是这是有代价的——如果您选择了一个糟糕的哈希函数,HashSet就会崩溃(非常严重)。如果许多项目的数字是唯一的,那么它们将在 HashSet 中发生冲突,然后必须一次又一次地尝试。如果很多物品与完全相同的数字发生碰撞,那么它们将追溯彼此的步骤,你将等待很长时间。第百万个条目将需要一百万次尝试一百万次 - HashSet 已退化为 O(n^2)。对于这些大 O 表示法(实际上就是 O(1)、O(log n) 和 O(n^2)),重要的是随着 n 的增加,括号中的数字增长的速度。缓慢生长或不生长是最好的。快速增长有时是不可避免的。对于十几个甚至一百个项目,差异可能可以忽略不计——但是如果您能够养成像替代方案一样轻松地编写高效函数的习惯,那么就值得自己调整自己这样做,因为最接近的问题纠正起来最便宜。点是你造成这个问题的地方。

What you need is a Set. It ensures that the items entered into it are unique (based on some qualifier which you will set up). Here is how it is done:

  • First, change your struct to a class. There is really no getting around that.

  • Second, provide an implementation of IEqualityComparer<stAppInfo>. It may be a hassle, but it is the thing that makes your set work (which we'll see in a moment):

    public class AppInfoComparer : IEqualityComparer<stAppInfo>
    {
        public bool Equals(stAppInfo x, stAppInfo y) {
            if (ReferenceEquals(x, y)) return true;
            if (x == null || y == null) return false;
            return Equals(x.sTitle, y.sTitle) && Equals(x.sRelativePath,
               y.sRelativePath);
        }
    
        // this part is a pain, but this one is already written 
        // specifically for your question.
        public int GetHashCode(stAppInfo obj) {
            unchecked {
                return ((obj.sTitle != null 
                    ? obj.sTitle.GetHashCode() : 0) * 397)
                    ^ (obj.sRelativePath != null 
                    ? obj.sRelativePath.GetHashCode() : 0);
            }
        }
    }
    
  • Then, when it is time to make your set, do this:

    var appInfoSet = new HashSet<stAppInfo>(new AppInfoComparer());
    foreach (ListViewItem item in listView1.Items)
    {
        var newItem = new stAppInfo { 
            sTitle = item.Text,
            sRelativePath = item.SubItems[1].Text,
            sCmdLine = item.SubItems[2].Text,
            bFindInstalled = (item.SubItems[3].Text.Equals("Sí")) ? true : false,
            sFindTitle = item.SubItems[4].Text,
            sFindVersion = item.SubItems[5].Text,
            bChecked = (item.SubItems[6].Text.Equals("Sí")) ? true : false};
        appInfoSet.Add(newItem);
    }
    
  • appInfoSet now contains a collection of stAppInfo objects with unique Title/Path combinations, as per your requirement. If you must have an array, do this:

    stAppInfo[] appInfo = appInfoSet.ToArray();
    

Note: I chose this implementation because it looks like the way you are already doing things. It has an easy to read for-loop (though I do not need the counter variable). It does not involve LINQ (wich can be troublesome if you aren't familiar with it). It requires no external libraries outside of what .NET framework provides to you. And finally, it provides an array just like you've asked. As for reading the file in from an INI file, hopefully you see that the only thing that will change is your foreach loop.

Update

Hash codes can be a pain. You might have been wondering why you need to compute them at all. After all, couldn't you just compare the values of the title and relative path after each insert? Well sure, of course you could, and that's exactly how another set, called SortedSet works. SortedSet makes you implement IComparer in the same way that I implemented IEqualityComparer above.

So, in this case, AppInfoComparer would look like this:

private class AppInfoComparer : IComparer<stAppInfo>
{
   // return -1 if x < y, 1 if x > y, or 0 if they are equal
   public int Compare(stAppInfo x, stAppInfo y)
   {
      var comparison = x.sTitle.CompareTo(y.sTitle);
      if (comparison != 0) return comparison;
      return x.sRelativePath.CompareTo(y.sRelativePath);
   }
}

And then the only other change you need to make is to use SortedSet instead of HashSet:

var appInfoSet = new SortedSet<stAppInfo>(new AppInfoComparer());

It's so much easier in fact, that you are probably wondering what gives? The reason that most people choose HashSet over SortedSet is performance. But you should balance that with how much you actually care, since you'll be maintaining that code. I personally use a tool called Resharper, which is available for Visual Studio, and it computes these hash functions for me, because I think computing them is a pain, too.

(I'll talk about the complexity of the two approaches, but if you already know it, or are not interested, feel free to skip it.)

SortedSet has a complexity of O(log n), that is to say, each time you enter a new item, will effectively go the halfway point of your set and compare. If it doesn't find your entry, it will go to the halfway point between its last guess and the group to the left or right of that guess, quickly whittling down the places for your element to hide. For a million entries, this takes about 20 attempts. Not bad at all. But, if you've chosen a good hashing function, then HashSet can do the same job, on average, in one comparison, which is O(1). And before you think 20 is not really that big a deal compared to 1 (after all computers are pretty quick), remember that you had to insert those million items, so while HashSet took about a million attempts to build that set up, SortedSet took several million attempts. But there is a price -- HashSet breaks down (very badly) if you choose a poor hashing function. If the numbers for lots of items are unique, then they will collide in the HashSet, which will then have to try again and again. If lots of items collide with the exact same number, then they will retrace each others steps, and you will be waiting a long time. The millionth entry will take a million times a million attempts -- HashSet has devolved into O(n^2). What's important with those big-O notations (which is what O(1), O(log n), and O(n^2) are, in fact) is how quickly the number in parentheses grows as you increase n. Slow growth or no growth is best. Quick growth is sometimes unavoidable. For a dozen or even a hundred items, the difference may be negligible -- but if you can get in the habit of programming efficient functions as easily as alternatives, then it's worth conditioning yourself to do so as problems are cheapest to correct closest to the point where you created that problem.

魂ガ小子 2024-12-06 07:14:33

使用 LINQ2Objects,按应该唯一的事物进行分组,然后选择每组中的第一项。

var noDupes = appInfo.GroupBy(
    x => new { x.sTitle, x.sRelativePath })
    .Select(g => g.First()).ToArray();

Use LINQ2Objects, group by the things that should be unique and then select the first item in each group.

var noDupes = appInfo.GroupBy(
    x => new { x.sTitle, x.sRelativePath })
    .Select(g => g.First()).ToArray();
怪异←思 2024-12-06 07:14:33

!!! 结构数组(值类型)+排序或任何类型的搜索==>大量的拆箱操作。

  1. 我建议坚持 Jon 和 Henk 的建议,因此将其作为一个类并使用通用的 List
  2. 使用 LINQ GroupBy 或 DistinctBy,对我来说,使用内置的 GroupBy 非常简单,但看看其他流行的库也很有趣,也许它会给您一些见解。

顺便说一句,另请查看 LambdaComparer 它将每当您需要这种就地排序/搜索等时,让您的生活变得更轻松......

!!! Array of structs (value type) + sorting or any kind of search ==> a lot of unboxing operations.

  1. I would suggest to stick with recommendations of Jon and Henk, so make it as a class and use generic List<T>.
  2. Use LINQ GroupBy or DistinctBy, as for me it is much simple to use built in GroupBy, but it also interesting to take a look at an other popular library, perhaps it gives you some insights.

BTW, Also take a look at the LambdaComparer it will make you life easier each time you need such kind of in place sorting/search, etc...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文