给定一个整数数组,找到第一个唯一的整数
给定一个整数数组,找到第一个唯一的整数。
我的解决方案:使用 std::map
将整数(数字作为键,其索引作为值)一一放入其中 (O(n^2 lgn))
,如果如果有重复,则从映射中删除该条目 (O(lg n))
,将所有数字放入映射后,迭代映射并找到索引最小的键 O(n)。
O(n^2 lgn)
因为map需要排序。
这效率不高。
其他更好的解决方案?
Given an array of integers, find the first integer that is unique.
my solution: use std::map
put integer (number as key, its index as value) to it one by one (O(n^2 lgn))
, if have duplicate, remove the entry from the map (O(lg n))
, after putting all numbers into the map, iterate the map and find the key with smallest index O(n).
O(n^2 lgn)
because map needs to do sorting.
It is not efficient.
other better solutions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我相信以下将是最佳解决方案,至少基于时间/空间复杂度:
步骤 1:
将整数存储在哈希映射中,该映射将整数作为键,并将其出现的次数作为值。这通常是一个O(n)操作,平均而言,哈希表中元素的插入/更新应该是恒定时间。如果发现一个整数出现两次以上,您实际上不必进一步增加使用计数(如果您不想)。
步骤2:
对整数执行第二次传递。在哈希映射中查找每个,第一个出现次数为 1 的就是您要查找的那个(即第一个出现的整数)。这也是O(n),使得整个过程O(n)。
针对特殊情况的一些可能的优化:
优化A:可以使用简单的数组代替哈希表。即使在最坏的情况下,这也保证了 O(1),用于计算特定整数的出现次数以及查找其出现计数。此外,这还增强了实时性能,因为不需要执行哈希算法。由于引用局部性可能较差(即较大的稀疏表与具有合理负载因子的哈希表实现),可能会出现命中。然而,这将适用于整数排序的非常特殊的情况,并且可以通过哈希表的哈希函数根据传入整数产生伪随机存储桶放置来缓解(即,开始时引用的位置较差)。
数组中的每个字节都表示该字节索引所表示的整数的计数(最多 255)。只有当最低整数和最高整数之间的差异(即有效整数域的基数)足够小以使该数组适合内存时,这才是可能的。特定整数数组中的索引将是其值减去数据集中存在的最小整数。
例如,在具有 64 位操作系统的现代硬件上,可以想象可以分配一个 4GB 的数组来处理整个 32 位整数域。如果有足够的内存,甚至可以想象更大的阵列。
在处理之前必须知道最小和最大整数,或者需要使用最小最大算法对数据进行另一次线性遍历以找出该信息。
优化 B:您可以进一步优化优化 A,每个整数最多使用 2 位(一位表示存在,另一位表示多重性)。这将允许每个字节表示四个整数,从而扩展数组实现以在给定的可用内存量下处理更大的整数域。这里可以玩更多的位游戏来进一步压缩表示,但它们仅支持传入数据的特殊情况,因此不能推荐用于仍然主要是一般的情况。
I believe that the following would be the optimal solution, at least based on time / space complexity:
Step 1:
Store the integers in a hash map, which holds the integer as a key and the count of the number of times it appears as the value. This is generally an O(n) operation and the insertion / updating of elements in the hash table should be constant time, on the average. If an integer is found to appear more than twice, you really don't have to increment the usage count further (if you don't want to).
Step 2:
Perform a second pass over the integers. Look each up in the hash map and the first one with an appearance count of one is the one you were looking for (i.e., the first single appearing integer). This is also O(n), making the entire process O(n).
Some possible optimizations for special cases:
Optimization A: It may be possible to use a simple array instead of a hash table. This guarantees O(1) even in the worst case for counting the number of occurrences of a particular integer as well as the lookup of its appearance count. Also, this enhances real time performance, since the hash algorithm does not need to be executed. There may be a hit due to potentially poorer locality of reference (i.e., a larger sparse table vs. the hash table implementation with a reasonable load factor). However, this would be for very special cases of integer orderings and may be mitigated by the hash table's hash function producing pseudorandom bucket placements based on the incoming integers (i.e., poor locality of reference to begin with).
Each byte in the array would represent the count (up to 255) for the integer represented by the index of that byte. This would only be possible if the difference between the lowest integer and the highest (i.e., the cardinality of the domain of valid integers) was small enough such that this array would fit into memory. The index in the array of a particular integer would be its value minus the smallest integer present in the data set.
For example on modern hardware with a 64-bit OS, it is quite conceivable that a 4GB array can be allocated which can handle the entire domain of 32-bit integers. Even larger arrays are conceivable with sufficient memory.
The smallest and largest integers would have to be known before processing, or another linear pass through the data using the minmax algorithm to find out this information would be required.
Optimization B: You could optimize Optimization A further, by using at most 2 bits per integer (One bit indicates presence and the other indicates multiplicity). This would allow for the representation of four integers per byte, extending the array implementation to handle a larger domain of integers for a given amount of available memory. More bit games could be played here to compress the representation further, but they would only support special cases of data coming in and therefore cannot be recommended for the still mostly general case.
这一切都是没有原因的。只需使用 2 个 for 循环 &一个变量会给你一个简单的
O(n^2)
算法。如果您不厌其烦地使用哈希映射,那么这也可能是 @Micheal Goldshteyn 建议的
更新:我知道这个问题已经有 1 年历史了。但在查看我回答的问题时发现了这一点。认为有比使用哈希表更好的解决方案。
当我们说独特时,我们就会有一个模式。例如:[5,5,66,66,7,1,1,77]。在此我们有 3 的移动窗口。首先考虑 (5,5,66)。我们可以很容易地建立。这里有重复的。因此将窗口移动 1 个元素,得到 (5,66,66)。同样在这里。移至下一个 (66,66,7)。再次在这里重复。下一个 (66,7,1)。这里没有重复的人!取中间元素,因为它必须是集合中的第一个唯一元素。左边的元素属于 dup,所以可以是 1。因此 7 是第一个唯一元素。
空间:O(1)
时间: O(n) * O(m^2) = O(n) * 9 ≈ O(n)
All this for no reason. Just using 2
for-loops
& a variable would give you a simpleO(n^2)
algo.If you are taking all the trouble of using a hash map, then it might as well be what @Micheal Goldshteyn suggests
UPDATE: I know this question is 1 year old. But was looking through the questions I answered and came across this. Thought there is a better solution than using a hashtable.
When we say unique, we will have a pattern. Eg: [5, 5, 66, 66, 7, 1, 1, 77]. In this lets have moving window of 3. first consider (5,5,66). we can easily estab. that there is duplicate here. So move the window by 1 element so we get (5,66,66). Same here. move to next (66,66,7). Again dups here. next (66,7,1). No dups here! take the middle element as this has to be the first unique in the set. The left element belongs to the dup so could 1. Hence 7 is the first unique element.
space: O(1)
time: O(n) * O(m^2) = O(n) * 9 ≈ O(n)
插入到地图是
O(log n)
而不是O(n log n)
因此插入n
个键将是n log n
。另外最好使用set
。Inserting to a map is
O(log n)
notO(n log n)
so insertingn
keys will ben log n
. also its better to useset
.虽然它是 O(n^2),但下面的系数很小,对缓存的影响也不错,并且使用速度很快的
memmem()
。Although it's O(n^2), the following has small coefficients, isn't too bad on the cache, and uses
memmem()
which is fast.@用户3612419
@user3612419