比较数组并获取差异

发布于 2024-09-12 06:59:41 字数 305 浏览 2 评论 0原文

我如何比较两个可能具有不同长度的数组并获得每个数组之间的差异?

例如:

Cat cat = new Cat();
Dog dog = new Dog();
Alligator alligator = new Alligator();

Animal animals[] = { cat, dog };
Animal animals2[] = { cat, dog, alligator };

我如何比较它们两个数组并使其返回Alligator的实例?

How would I compare two arrays that might have different lengths and get the difference between each array?

For example:

Cat cat = new Cat();
Dog dog = new Dog();
Alligator alligator = new Alligator();

Animal animals[] = { cat, dog };
Animal animals2[] = { cat, dog, alligator };

How would I compare them two arrays and make it return the instance of Alligator?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

夜清冷一曲。 2024-09-19 06:59:41

我建议你的问题需要澄清。目前,每个人都在猜测你实际上在问什么。

  • 数组是用来表示集合、列表还是介于两者之间的东西?换句话说,元素顺序重要吗?可以有重复吗?
  • “平等”是什么意思? new Cat()“等于”new Cat()吗?你的例子表明确实如此!
  • 你所说的“差异”是什么意思?你的意思是设置差异吗?
  • 如果两个数组的长度相同,您希望发生什么?
  • 这是一次性比较还是同一数组重复发生?
  • 数组中有多少个元素(平均)?
  • 你为什么要使用数组?

假设这些数组是真正的集合,那么您可能应该使用 HashSet 而不是数组,并使用 addAllretainAll< 等集合操作/code> 计算设置的差异。

另一方面,如果数组旨在表示列表,则根本不清楚“差异”的含义。

如果代码运行速度至关重要,那么您肯定需要重新考虑您的数据结构。如果你总是从数组开始,你将无法快速计算“差异”......至少在一般情况下是这样。

最后,如果您要使用任何依赖于 equals(Object) 方法的内容(并且包括任何 Java 集合类型),您确实需要清楚地了解“equals”是什么在您的应用程序中应该意味着所有 Cat 实例都相同吗?它们是否都不同?如果您不明白这一点,并相应地实现 equalshashCode 方法,您将得到令人困惑的结果。

I would suggest that your question needs to be clarified. Currently, everyone is guessing what about what you are actually asking.

  • Are the arrays intended to represent sets, or lists, or something in between? In other words, does element order matter, and can there be duplicates?
  • What does "equal" mean? Does new Cat() "equal" new Cat()? Your example suggests that it does!!
  • What do you mean by the "difference"? Do you mean set difference?
  • What do you want to happen if the two arrays have the same length?
  • Is this a once-off comparison or does it occur repeatedly for the same arrays?
  • How many elements are there in the arrays (on average)?
  • Why are you using arrays at all?

Making the assumption that these arrays are intended to be true sets, then you probably should be using HashSet instead of arrays, and using collection operations like addAll and retainAll to calculate the set difference.

On the other hand, if the arrays are meant to represent lists, it is not at all clear what "difference" means.

If it is critical that the code runs fast, then you most certainly need to rethink your data structures. If you always start with arrays, you are not going to be able to calculate the "differences" fast ... at least in the general case.

Finally, if you are going to use anything that depends on the equals(Object) method (and that includes any of the Java collection types, you really need to have a clear understanding of what "equals" is supposed to mean in your application. Are all Cat instances equal? Are they all different? Are some Cat instances equal and others not? If you don't figure this out, and implement the equals and hashCode methods accordingly you will get confusing results.

万水千山粽是情ミ 2024-09-19 06:59:41

我建议您将对象放入集合中,然后使用集合的交集:

// Considering you put your objects in setA and setB

Set<Object> intersection = new HashSet<Object>(setA);
intersection.retainAll(setB);

之后,您可以使用removeAll来获得两个集合中任何一个的差异:

setA.removeAll(intersection);
setB.removeAll(intersection);

灵感来自:http://hype-free.blogspot.com/2008/11/calculate-intersection-of-两个java.html

I suggest that you put your objects in sets and then use an intersection of the sets:

// Considering you put your objects in setA and setB

Set<Object> intersection = new HashSet<Object>(setA);
intersection.retainAll(setB);

After that you can use removeAll to get a difference to any of the two sets:

setA.removeAll(intersection);
setB.removeAll(intersection);

Inspired by: http://hype-free.blogspot.com/2008/11/calculating-intersection-of-two-java.html

旧城空念 2024-09-19 06:59:41

好吧,您也许可以使用 Set 来代替,并使用 removeAll() 方法。

或者您可以使用以下简单而缓慢的算法进行操作:

List<Animal> differences = new ArrayList<Animal>();

    for (Animal a1 : animals) {
       boolean isInSecondArray = false;
       for (Animal a2 : animals2) {
           if (a1 == a2)  {
                isInSecondArray = true;
                break;
           }
       } 

       if (!isInSecondArray)
           differences.add(a1)
    }

然后 differences 将包含 animals 数组中但不在 animals2 中的所有对象大批。以类似的方式,您可以执行相反的操作(获取 animals2 中但不在 animals 中的所有对象)。

Well, you could maybe use Set instead and use the removeAll() method.

Or you could use the following simple and slow algorithm for doing:

List<Animal> differences = new ArrayList<Animal>();

    for (Animal a1 : animals) {
       boolean isInSecondArray = false;
       for (Animal a2 : animals2) {
           if (a1 == a2)  {
                isInSecondArray = true;
                break;
           }
       } 

       if (!isInSecondArray)
           differences.add(a1)
    }

Then differences will have all the objects that are in animals array but not in animals2 array. In a similar way you can do the opposite (get all the objects that are in animals2 but not in animals).

狠疯拽 2024-09-19 06:59:41

您可能需要查看这篇文章以获取更多信息:

http: //download-llnw.oracle.com/javase/tutorial/collections/interfaces/set.html

正如前面提到的,removeAll() 就是为此而设计的,但您会想要这样做两次,这样您就可以创建一个列表,列出两者中缺少的所有内容,然后您可以将这两个结果组合起来,得到所有差异的列表。

但是,这是一种破坏性操作,因此如果您不想丢失信息,请复制 Set 并对其进行操作。

更新:

看来我对数组中内容的假设是错误的,因此removeAll()将不起作用,但需要5毫秒,具体取决于数组的数量搜索它的项目可能是一个问题。

因此,HashMap 似乎是最好的选择,因为它的搜索速度很快。

Animal 是一个至少有一个属性String name 的接口。为每个实现 Animal 的类编写 EqualshashCode 代码。您可以在这里找到一些讨论: http://www.ibm.com /developerworks/java/library/j-jtp05273.html。这样,如果您希望哈希值是动物类型和名称的组合,那就没问题了。

因此,基本算法是将所有内容保留在哈希图中,然后为了搜索差异,只需获取一个键数组,然后搜索该键是否包含在另一个列表中,如果不包含它到 List中,将值存储在那里。
您将需要执行两次此操作,因此,如果您至少有一个双核处理器,则可能会从在单独的线程中完成两个搜索中获得一些好处,但随后您将需要使用添加的并发数据类型之一在 JDK5 中,这样您就不必担心差异组合列表中的同步。

因此,我首先将其编写为单线程并进行测试,以了解它的速度有多快,并将其与原始实现进行比较。
然后,如果您需要更快,请尝试使用线程,再次进行比较,看看是否有速度提升。

在进行任何优化之前,请确保您对已有的内容有一些指标,以便您可以进行比较并查看一项更改是否会导致速度提高。

如果一次改动太多,有的可能速度有很大提升,但有的可能会导致性能下降,而且是看不出来的,所以每次改动应该一次一个。

不过,不要失去其他实现,通过使用单元测试并每次测试大约 100 次,您可以了解每个更改给您带来了哪些改进。

You may want to look at this article for more information:

http://download-llnw.oracle.com/javase/tutorial/collections/interfaces/set.html

As was mentioned, removeAll() is made for this, but you will want to do it twice, so that you can create a list of all that are missing in both, and then you could combine these two results to have a list of all the differences.

But, this is a destructive operation, so if you don't want to lose the information, copy the Set and operate on that one.

UPDATE:

It appears that my assumption of what is in the array is wrong, so removeAll() won't work, but with a 5ms requirement, depeending on the number of items to search it could be a problem.

So, it would appear a HashMap<String, Animal> would be the best option, as it is fast in searching.

Animal is an interface with at least one property, String name. For each class that implements Animal write code for Equals and hashCode. You can find some discussion here: http://www.ibm.com/developerworks/java/library/j-jtp05273.html. This way, if you want the hash value to be a combination of the type of animal and the name then that will be fine.

So, the basic algorithm is to keep everything in the hashmaps, and then to search for differences, just get an array of keys, and search through to see if that key is contained in the other list, and if it isn't put it into a List<Object>, storing the value there.
You will want to do this twice, so, if you have at least a dual-core processor, you may get some benefit out of having both searches being done in separate threads, but then you will want to use one of the concurrent datatypes added in JDK5 so that you don't have to worry about synchronizations in the combined list of differences.

So, I would write it first as a single-thread and test, to get some ideas as to how much faster it is, also comparing it to the original implmemntation.
Then, if you need it faster, try using threads, again, compare to see if there is a speed increase.

Before making any optimization ensure you have some metrics on what you already have, so that you can compare and see if the one change will lead to an increase in speed.

If you make too many changes at a time, one may have a large improvement on speed, but others may lead to a performance decrease, and it wouldn't be seen, which is why each change should be one at a time.

Don't lose the other implementations though, by using unit tests and testing perhaps 100 times each, you can get an idea as to what improvement each change gives you.

小清晰的声音 2024-09-19 06:59:41

我不关心我的用法的性能(你也不应该关心,除非你有充分的理由,并且你通过分析器发现这段代码是瓶颈)。

我所做的与功能性的答案类似。我使用 LINQ 设置运算符来获取每个列表上的异常:

http://msdn .microsoft.com/en-us/library/bb397894.aspx

编辑:

抱歉,我没有注意到这是 Java。抱歉,我已经进入了 C# la-la land,它们看起来非常相似:)

I don't care about perf for my usages (and you shouldn't either, unless you have a good reason to, and you find out via your profiler that this code is the bottleneck).

What I do is similar to functional's answer. I use LINQ set operators to get the exception on each list:

http://msdn.microsoft.com/en-us/library/bb397894.aspx

Edit:

Sorry, I didn't notice this is Java. Sorry, I'm off in C# la-la land, and they look very similar :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文