Java:两个列表之间的差异
我公司的猫群应用程序跟踪一群猫。它需要定期比较 previousOrder
和 currentOrder
(每个都是 ArrayList
),并通知 cat-wranglers 任何更改。
每只猫都是独一无二的,并且只能在每个列表中出现一次(或根本不出现)。大多数情况下,previousOrder
和 currentOrder
列表具有相同的内容和相同的顺序,但可能会发生以下任何情况(从更频繁到不太频繁):
- 猫的顺序完全打乱
- 猫在列表中单独向上或向下移动
- 新猫在车队中的特定点加入
- 猫离开车队
这看起来像 编辑距离问题 对我来说。理想情况下,我正在寻找一种算法来确定使 previousOrder
匹配 currentOrder
所需的步骤:
- 将
Fluffy
移动到位置12
- 在位置
37
插入Snuggles
- 删除
Mr. Chubbs
- 等。
该算法还应该识别场景#1,在这种情况下,新订单将被完整传达。
最好的方法是什么?
(这个帖子和该帖子提出了相似的观点问题,但它们都处理排序列表。我的是有序,但未排序。)
编辑
< strong>Levenshtein 算法是一个很好的建议,但我担心创建矩阵的时间/空间要求。我的主要目标是尽快确定并传达变更。这比查找添加内容并发送“这是新猫,这是当前订单”之类的消息更快。
My company's cat-herding application tracks a convoy of cats. Periodically, it needs to compare previousOrder
to currentOrder
(each is an ArrayList<Cat>
) and notify the cat-wranglers of any changes.
Each cat is unique and can only appear once in each list (or not at all). Most of the time, the previousOrder
and currentOrder
lists have the same contents, in the same order, but any of the following can happen (from more frequent to less frequent):
- The order of cats is scrambled completely
- Cats individually move up or down in the list
- New cats join, at a specific point in the convoy
- Cats leave the convoy
This appears like an edit distance problem to me. Ideally, I am looking for an algorithm that determines the steps required to make previousOrder
match currentOrder
:
- MOVE
Fluffy
to position12
- INSERT
Snuggles
at position37
- DELETE
Mr. Chubbs
- etc.
The algorithm should also recognize scenario #1, in which case the new order is communicated in its entirety.
What's the best approach for this?
(This post and that post pose similar questions, but they are both dealing with sorted lists. Mine are ordered, but unsorted.)
EDIT
The Levenshtein algorithm is a great suggestion, but I'm concerned about the time/space requirement of creating a matrix. My main goal is to determine and communicate the changes as quickly as possible. Something that is faster than finding the additions and sending message along the lines of "Here are the new cats, and here is the current order."
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这是我用来合并两个列表(
old
和new
)的算法。它不是最优雅或最高效的,但对于我使用它的数据来说似乎可以正常工作。new
是最新更新的数据列表,而old
是需要转换为new
的过时列表。该算法对旧列表执行操作 - 相应地删除、移动和插入项目。删除都是预先完成的,以使项目在第二个循环中的位置更可预测。
这背后的驱动力是将来自服务器的数据列表与浏览器 DOM 中的
行同步(使用 JavaScript)。之所以需要它,是因为我们不想在数据发生变化时重新绘制整个表;列表之间的差异可能很小,仅影响一两行。它可能不是您正在寻找的数据算法。如果没有,请告诉我,我会删除它。
可能可以为此进行一些优化。但对于我和我正在使用的数据来说,它的性能和可预测性足够。
Here's an algorithm I put together to merge two lists,
old
andnew
. It's not the most elegant or efficient, but it seems to work okay for the data I'm using it for.new
is the most updated list of data, andold
is the out-of-date list that needs to get transformed intonew
. The algorithm performs its operations on theold
list - removing, moving, and inserting items accordingly.The deletions are all done up front to make the positions of the items more predictable in the second loop.
The driving force behind this was to sync a list of data from the server with
<table>
rows in a browser DOM (using javascript). It was needed because we didn't want to redraw the entire table whenever the data changed; the differences between the lists were likely to be small and only affect one or two rows. It may not be the algorithm you're looking for for your data. If not, let me know and I'll delete this.There are probably some optimizations that could be made for this. But it is performant and predictable enough for me and the data I'm working with.
编辑距离度量。
http://www.levenshtein.net/
Levenshtein distance metric.
http://www.levenshtein.net/
解决这个问题的一个有效方法是使用动态规划。维基百科有一个密切相关问题的伪代码:计算 Levenshtein 距离。
跟踪实际操作并纳入“扰乱”操作应该不会太困难。
An efficient way to solve this is by using dynamic programming. Wikipedia has pseudo-code for a closely related problem: Computing Levenshtein distance.
Keeping track of the actual operations and incorporating the "scramble" operation shouldn't be too difficult.
我知道提问者正在寻求 Java 解决方案,但我在寻求用 C# 实现的算法时遇到了这个问题。
这是我的解决方案,它生成简单 IListDifference 值的枚举:ItemAddedDifference、ItemRemovedDifference 或 ItemMovedDifference。
它使用源列表的工作副本逐项建立将其转换为匹配目标列表所需的修改。
只是注意到这使用了 IEnumerable 上的自定义扩展方法 - 'IndexOf':
I know the questioner was seeking a Java solution, but I came across this question whilst seeking an algorithm to implement in C#.
Here's my solution, which generates an enumeration of simple IListDifference values: either ItemAddedDifference, ItemRemovedDifference or ItemMovedDifference.
It uses a working copy of the source list to establish, item by item, what modifications are necessary to transform it to match the target list.
Just noticed this makes use of a custom extension method on IEnumerable - 'IndexOf':
我最近不得不这样做,除非项目可能存在多次。这件事很复杂,但我能够使用前瞻计数器和其他一些疯狂的东西来做到这一点。它看起来很像 Rob 的解决方案,所以感谢他让我开始!
首先,假设我们想要返回将第一个列表转换为第二个列表的操作列表:
并且我们有一些辅助方法来构造操作。实际上,您不需要“移动”操作,甚至还可以进行“交换”(或相反),但这就是我所采用的:
现在我们将定义一个特殊的类来保存我们的前瞻counts:
还有一个帮助方法来获取列表中下一个键的索引:
现在我们准备好进行转换:
让您的头脑变得有点棘手,但基本上您会进行删除,以便您知道每个您正在插入或移动的密钥。然后你再次浏览列表,如果有足够的,你从列表中你还没有看到的部分移动一个,否则插入。当你到达终点时,一切都已排好。
I recently had to do this, except items could exist multiple times. This complicated things, but I was able to do it using look-ahead counters and some other craziness. It looks a lot like Rob's solution, so thanks to him for getting me started!
First off, let's assume that we want to return the list of operations that will transform the first list into the second:
and we have some helper methods to construct operations. You actually don't need a "move" operation, and you could even have a "swap" as well (or instead), but this is what I went with:
Now we'll define a special class to hold our look-ahead counts:
And a helper method to get the index of the next key in the list:
Now we're ready to do the transform:
It's a bit tricky to get your head around, but basically you do the deletes so that you know for every key you are either inserting or moving. Then you run through the list again and if there's enough, you move one from the part of the list your haven't seen yet, otherwise insert. By the time you get to the end, it all lines up.