如何将工作循环的工作转移到熊猫的应用程序更快的应用功能?
我有一个带有经度和纬度列的数据框。我需要借助Geopy套餐来基于长长和LAT值来获取位置的县名。
longitude latitude housing_median_age total_rooms total_bedrooms \
0 -114.31 34.19 15.0 5612.0 1283.0
1 -114.47 34.40 19.0 7650.0 1901.0
2 -114.56 33.69 17.0 720.0 174.0
3 -114.57 33.64 14.0 1501.0 337.0
4 -114.57 33.57 20.0 1454.0 326.0
population households median_income median_house_value
0 1015.0 472.0 1.4936 66900.0
1 1129.0 463.0 1.8200 80100.0
2 333.0 117.0 1.6509 85700.0
3 515.0 226.0 3.1917 73400.0
4 624.0 262.0 1.9250 65500.0
我在for循环方面取得了成功:
geolocator = geopy.Nominatim(user_agent='1234')
for index, row in df.iloc[:10, :].iterrows():
location = geolocator.reverse([row["latitude"], row["longitude"]])
county = location.raw['address']['county']
print(county)
数据集有17,000行,所以这应该是一个问题,对吗?
因此,我一直在尝试弄清楚如何构建可以在pandas.apply()中使用的函数,以便获得更快的结果。
def get_zipcodes():
location = geolocator.reverse([row["latitude"], row["longitude"]])
county = location.raw['address']['county']
print(county)
counties = get_zipcodes()
我陷入困境,不知道如何在此处使用(或任何其他聪明的方法)。帮助您非常感谢。
I have a dataframe with longitude and latitude columns. I need to get the county name for the location based on long and lat values with the help of the geoPy package.
longitude latitude housing_median_age total_rooms total_bedrooms \
0 -114.31 34.19 15.0 5612.0 1283.0
1 -114.47 34.40 19.0 7650.0 1901.0
2 -114.56 33.69 17.0 720.0 174.0
3 -114.57 33.64 14.0 1501.0 337.0
4 -114.57 33.57 20.0 1454.0 326.0
population households median_income median_house_value
0 1015.0 472.0 1.4936 66900.0
1 1129.0 463.0 1.8200 80100.0
2 333.0 117.0 1.6509 85700.0
3 515.0 226.0 3.1917 73400.0
4 624.0 262.0 1.9250 65500.0
I had success with a for loop:
geolocator = geopy.Nominatim(user_agent='1234')
for index, row in df.iloc[:10, :].iterrows():
location = geolocator.reverse([row["latitude"], row["longitude"]])
county = location.raw['address']['county']
print(county)
The dataset has 17,000 rows, so that should be a problem, right?
So I've been trying to figure out how to build a function which I could use in pandas.apply() in order to get quicker results.
def get_zipcodes():
location = geolocator.reverse([row["latitude"], row["longitude"]])
county = location.raw['address']['county']
print(county)
counties = get_zipcodes()
I'm stuck and don't know how to use apply (or any other clever method) in here. Help is much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用Geopy时,代码中的熊猫计算不太可能是速度瓶颈(请参阅这个答案对不同的地理问题)。
但是,如果有可能有大量的行带有重复
纬度,经度
坐标,则可以使用@Cache
(或@lru_cache( none)
)来自Functools的装饰器。这是在数据框架上使用
apply()
没有特殊缓存的方法:完整的测试代码:
输入:
输出:
这是如何使用装饰器来缓存结果(即,避免一路走来多次进入地理服务器),用于相同的
纬度,经度
坐标:The pandas calculations in your code are unlikely to be the speed bottleneck when using geopy (see this answer to a different geopy question).
However, if there's a possibility that there may be a significant number of rows with duplicate
latitude, longitude
coordinates, you can use the@cache
(or@lru_cache(None)
) decorator from functools.Here is how to use
apply()
on your dataframe with no special caching:Full test code:
Input:
Output:
Here is how to use a decorator to cache results (i.e., to avoid going all the way out to a geopy server multiple times) for identical
latitude, longitude
coordinates: