获取长向量中最小值索引的有效方法,python
我有一长串经度值 (len(Lon) = 420481) 和另一个纬度值。我想找到经度最小值对应的纬度。
我尝试过:
SE_Lat = [Lat[x] for x,y in enumerate(Lon) if y == min(Lon)]
但这需要很长时间才能完成。
有谁知道更有效的方法吗?
也许您对此也有建议: 我现在尝试找到与新经度最接近的对应纬度,该经度不在原始经度向量中。我尝试了这个:
minDiff = [min(abs(x - lon_new) for x in lons)] # not very quick, but works
[(lat,lon) for lat,lon in izip(lats,lons) if abs(lon-lon_new)==minDiff]
最后一行抛出错误,因为有多个匹配项。我目前不知道如何只找到一个值,比如说第一个值。非常感谢任何帮助!
I have a long list of longitude values (len(Lon) = 420481), and another one of latitude values. I want to find the corresponding latitude to the minimum of the longitude.
I tried:
SE_Lat = [Lat[x] for x,y in enumerate(Lon) if y == min(Lon)]
but this takes ages to finish.
Does anyone know a more efficient way?
Maybe you also have a suggestions for this:
I now try to find the closest corresponding latitude to a new longitude, which is not in the original longitude vector. I tried this:
minDiff = [min(abs(x - lon_new) for x in lons)] # not very quick, but works
[(lat,lon) for lat,lon in izip(lats,lons) if abs(lon-lon_new)==minDiff]
The last line throws an error, because there are multiple matches. I don't know at the moment how to find only one value, lets say the first. Any help is greatly appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我可以推荐 numpy 吗?
这比 min(izip()) 解决方案快得多(当使用 420481 条随机创建的记录时,使用我的设置大约 20 倍),尽管当然您需要将数据值存储在 numpy 中才能利用这种加速。
May I recommend numpy?
this is way faster than the min(izip()) solution (~20x using my setup when using 420481 randomly created records), although of course you'd need to store your data values in numpy to take advantage of this speed-up.
与其直接跳入解决此问题的众多替代方案之一(可以在其他答案中看到),不如枚举为什么原始示例中的代码如此缓慢。
我们从OP中知道
len(Lon) == 420481
。现在,找到最小值是一个 O(N) 操作(您必须至少查看每个值一次)。在列表推导式中,每次迭代都会重新评估条件。上面的代码在每次循环时都重新计算最小值,将原本应该是 O(N) 的操作变成了 O(N^2)(在这种情况下仅需要 177十亿 次迭代) 。只需将
min(Lon)
的结果缓存在局部变量中,并在循环条件中使用它,而不是每次迭代都重新计算它,可能会将运行时间降低到可接受的水平。然而,我个人的处理方式(假设我稍后想要所有的纬度、经度和索引):
尽管有很多可能性,但哪一种最好会根据具体的用例而有所不同。
Rather than jumping right in with one of the many alternatives for solving this (which can be seen in the other answers), it's worth enumerating why the code in the original example is so slow.
We know from the OP that
len(Lon) == 420481
. Now, finding the minimum value is an O(N) operation (you have to look at every value at least once). In a list comprehension, the condition is reevaluated on every iteration. The above code recalculates the minimum value on every pass through the loop, blowing what should be an O(N) operation out to be O(N^2) (A mere 177 billion iterations in this case).Simply caching the result of
min(Lon)
in a local variable and using that in the loop condition instead of recalculating it every iteration would likely bring the runtime down to an acceptable level.However, the way I would personally go about it (assuming I wanted all of the latitude, longitude and index later on):
There are plenty of possibilities though, and which one is best will vary based on the exact use case.
首先找到索引:
Just first find the index:
根据 Ignacio 的解决方案,如果您使用 python2,您将需要使用
izip
而不是zip
。然而,对于你在 python2 中所做的一切都是如此。As per Ignacio's solution, if you are using python2, you will want to use
izip
rather thanzip
. This is, however, true for everything you do in python2.这是我最初的答案:
但我看到OP似乎允许在最小lon值下存在多个匹配,为此,我不认为有一句台词。诀窍是,您只想找到 min(lons) 一次,而不是为每个 lat,lon 对查找一次:
这个单行可能适合您,因为 lambda 参数 minlon 应该只计算一次:
不确定它的效果如何不过,可以处理 420481 个元素的列表。为了可读性和长期支持,我可能会选择更明确的 2 行解决方案。
最后一点:
有时,您只能通过一个序列一次,例如当它是迭代器或生成器的输出时。为了支持多个匹配并且只通过两个列表一次,这是我能做的最好的事情
:
Here was my original answer:
But I see that the OP seemed to be allowing for there being multiple matches at the minimum lon value, and for this, I don't think there is a one-liner. The trick is, you only want to find min(lons) once, not once for every lat,lon pair:
This one-liner might work for you, since the lambda argument minlon should only be computed once:
Not sure how well it will work on 420481-element lists though. And for readability and long-term support, I would probably choose the more explicit 2-liner solution.
Last point:
Sometimes you only get one pass through a sequence, such as when it is an iterator, or the output of a generator. To support multiple matches and take only one pass through the two lists, this was the best I could do:
Prints: