是否可以在 O(n) 时间内从排序列表中删除重复项?
我怀疑有一种方法可以通过比迭代该子列表更快地定位一系列重复值的另一端来进行保存
I suspect there is a way if you can save by locating the other end of a range of repeated values faster than by iterating through that sublist
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
一般来说,没有。想象一个包含 N 个重复项的列表。您必须进行 N-1 次删除,因此 O(N)。
如果您指定特定的数据结构,其元素删除效果优于 O(1),那么对于某些类型的输入可能有更好的方法。
即使您可以在 O(1) 中有效地删除一系列元素,并且需要 O(1) 时间才能找到重复项 - 想象一个包含 N/2 对重复项的列表。您仍然需要执行 N/2 次搜索并删除 N/2 个范围,两者的复杂度都是 O(N)。
(还有一点含糊之处,因为问题标题是“删除重复项”,但正文特定于删除一个范围)
如果排序产生的列表具有以下表示形式 - 每个节点都有一个值和出现次数那么,删除一个值的重复项就会将该节点的计数设置为 1。 (跳过列表可能具有类似的特征,假设有一个良好的垃圾收集环境,无需回收成本内存),因此一次复制的时间复杂度为 O(1)。如果您需要从列表中删除所有重复项,它仍然是 O(N)。
In general, no. Imagine a list of N duplicates. You would have to make N-1 removals, hence O(N).
If you specify a particular data structure with better than O(1) removal of elements, then there might better way for certain sorts of inputs.
Even if you can efficiently remove a range of elements in O(1), and it takes O(1) time to find a duplicate - imagine a list where there are N/2 pairs of duplicates. You'll still have to do N/2 searches and remove N/2 ranges, both of which are O(N).
(there's also a bit of ambiguity as the question title is 'remove duplicates', but the body is specific to removing one range)
If the list resulting from your sort has the following representation - each node has a value, and an occurrence count for that, then removing the duplications for one value will trivially set the count to 1 for that node. ( A skip list probably has similar characteristics, assuming a decent garbage collected environment where there's no cost to reclaiming memory), so that would be O(1) for one duplication. If you need to remove all duplicates from the list, it would still be O(N).
一般来说没有,因为你总是可以构造一个 O(n) 的情况(一个没有重复项的列表)。然而,如果您开始对数据做出假设(例如,最多有 log n 个不同元素),您可能会得到更好的结果(但在这种特殊情况下我不确定)。
当然,这假设您有某种方法可以进行有效的“批量删除”,这意味着您可以删除 O(1) 中任何范围的相等元素,无论其大小如何。
In general there is not, because you can always construct a case where you have O(n) (a list with no duplicates). If you start making assumptions on the data however (for instance that there are at most log n distinct elements), you may get something better (I'm not sure in this particular case though).
This does of course assume that you have some way of doing efficient "bulk removes", meaning that you can remove any range of equal elements in O(1), regardless of its size.
不可能将
所有元素与其他元素进行比较,我们需要进行 n*(n-1) = n2-n 比较...`
There cant be
as for comparing all the elements with the other we need to do n*(n-1) = n2-n comparisions...`
我会采用“二分搜索”方法来查找范围的末尾:
假设我们有一个包含 n 个元素的排序列表。
I would go for a 'binary search' approach for finding ends of ranges:
Let's assume we have a sorted list of n elements.