如何优化Python代码来计算两个GPS点之间的距离
我正在寻找一种更快的方法来优化我的 python 代码来计算两个 GPS 点、经度和纬度之间的距离。这是我的代码,我想优化它以使其工作得更快。
def CalcDistanceKM(lat1, lon1, lat2, lon2):
lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
distance = 6371 * c
return distance
此代码的行为是计算两个不同 excel(CSV 文件)中两个纬度和经度之间的距离,并返回它们之间的距离。
更多代码来解释该行为:
for i in range(File1):
for j in range(File2):
if File1['AA'][i] == File2['BB'][j]:
distance = CalcDistanceKM(File2['LATITUDE'][j], File2['LONGITUDE'][j],
File1['Latitude'][i],File1['Longitude'][I])
File3 = File3.append({'DistanceBetweenTwoPoints' : (distance) })
谢谢。
I'm looking for a faster way to optimize my python code to calculate the distance between two GPS points, longitude, and latitude. Here is my code and I want to optimize it to work faster.
def CalcDistanceKM(lat1, lon1, lat2, lon2):
lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
distance = 6371 * c
return distance
The behavior of this code is to calculate a distance between two latitudes, and longitudes from two different excel (CSV files), and return the distance between them.
A more code to explain the behavior:
for i in range(File1):
for j in range(File2):
if File1['AA'][i] == File2['BB'][j]:
distance = CalcDistanceKM(File2['LATITUDE'][j], File2['LONGITUDE'][j],
File1['Latitude'][i],File1['Longitude'][I])
File3 = File3.append({'DistanceBetweenTwoPoints' : (distance) })
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
将您的点准备到 numpy 数组中,然后使用准备好的数组调用此半正弦函数一次,以利用 c 性能和矢量化优化 - 两者都是来自出色的 numpy 库的免费赠品:
我在 File1 和 File 2 中看到您正在重复迭代,是吗?在那里寻找匹配项? for 循环非常慢,因此这将是一个很大的瓶颈,但如果没有关于正在使用的 csv 以及 file1 中的记录如何与 file2 匹配的更多信息,我无能为力。也许将两个文件中的前几条记录添加到问题中以提供一些上下文?
更新:
感谢您提供 colab 链接。
您从两个数据帧drive_test 和Cells 开始。您的“if”条件之一:
可以根据交叉合并的这种方法编写为 pandas 合并和过滤器 创建两个二维 pandas 数据框的组合
,然后您应该获得第一个条件的所有结果,并且可以对其余条件重复此操作。我无法在您的 csv 上测试这一点,因此可能需要一些调试,但这个想法应该没问题。
请注意,根据您的 csv 有多大,这可能会爆炸成一个非常大的数据帧并耗尽您的 RAM,在这种情况下,您几乎只能逐一迭代它,除非您想制作一种分段方法,其中您迭代一个数据帧中的列,并根据另一数据帧中的条件匹配所有列。这仍然比一次迭代两个更快,但可能比一次全部迭代慢。
更新 - 尝试第二个想法,因为新的数据帧似乎使内核崩溃
在您的循环中,您可以对第一个条件执行类似的操作(对于所有下一个匹配条件类似),
无论如何,这应该相当快,因为您将仅使用 1 个 python“for”循环,然后让超快的 numpy/pandas 查询执行下一步。该模板也应该适用于您的其余条件。
prepare your points into numpy arrays and then call this haversine function once with the prepared arrays to take advantage of c performance and vectorisation optimisations - both freebies from the brilliant numpy library:
I see in File1 and File 2 you are iterating both repeatedly, are you searching for matches there? for loops are very slow so that will be a big bottleneck but without a bit more information on the csv's being used and how records in file1 are matched with file2 I can't help with that. Maybe add the first couple of records from both files to the question to give it a bit of context?
update:
Thanks for including the colab link.
You start with two dataframes drive_test and Cells. one of your "if" conditions:
can be written as a pandas merge and filter, based on this method of a cross merge Create combination of two pandas dataframes in two dimensions
and then you should have all the results for the first condition, and this can be repeated for the remaining conditions. I'm not able to test this on your csvs so it might need a little bit of debugging but the idea should be fine.
note, depending on how big your csvs are, this could explode into an extremely big dataframe and max out your RAM, in which case you are pretty much stuck with iterating it one by one as you are unless you wanted to make a piecewise method where you iterate columns in one dataframe and match all columns subject to the conditions in the other. this will still be faster than iterating both one at a time but probably slower so than doing it all at once.
update - trying the second idea since the new dataframe seems to crash the kernel
In your loop, you can do something like this for the first condition (and similar for all the next matching conditions)
which should be quite considerably faster anyway since you'll be using just 1 python "for" loop and then letting the superfast numpy/pandas query do the next. this template should also be applicable to your remaining conditions.
我建议从
pyproj
...由于
pyproj
是C ++proj
库的接口,因此与纯Python相比,我期望有很大的速度...I'd suggest to have a look at the
geod
module frompyproj
...Since
pyproj
is an interface to the c++Proj
library I'd expect a major speedup compared to pure python...https://pyproj4.github.io/pyproj/stable/examples.html#geodesic-line-length