如何有效地迭代具有数百万行的 pandas 数据集并将函数传递给每一行?
我有一个 pandas 数据框,其中包含 700 万个飞行数据实例。航班数据带有我用来获取当时天气的位置和时间。目前,对于 1000 个实例,我的代码花费了 83 秒。考虑到我要经历 700 万,这太长了。我不知道该怎么做。我读到itertuples会更快一点,或者可能使用Dask或多线程。
import warnings
import pandas as pd
from datetime import datetime
from meteostat import Point, Daily
import airportsdata
import time
import threading
import os
warnings.simplefilter(action='ignore', category=FutureWarning)
airports = airportsdata.load('IATA')
all1 = pd.DataFrame()
all2 = pd.DataFrame()
first_mil = pd.read_csv('first_mil.csv')
def weather_from_IATA(airport, time):
"""
uses airport data from "airportsdata" to find latitude and longitude
and pass lat and lon into meteostat, which will return the weather at
that airport at that time.
"""
lon, lat = airports[airport]['lat'], airports[airport]['lon']
# Set time period
start, end = time, time
# Create Point for airport
location = Point(lon, lat)
# Get daily data for location
data = Daily(location, start, end)
weather_time = data.fetch()
# =================================================================
return weather_time
for air in range(1000):
Iata = (first_mil.iloc[air]['ORIGIN'])
flight_time = (first_mil.iloc[air]['FL_DATE'])
weather_data = weather_from_IATA(Iata, flight_time)
all1 = all1.append(weather_data)
end1 = time.time()
print("iteration for append took: ", (end1 - start), "sec")
I have a pandas dataframe with 7 million instances of flight data. the flight data comes with the location and the time which I am using to pull weather for that time. Right now for 1000 instances, my code is taking 83 seconds. This is way too long considering I got 7 million to go through. I am not sure how to go about this. I've read that itertuples would be a bit quicker or maybe using Dask or multitreading.
import warnings
import pandas as pd
from datetime import datetime
from meteostat import Point, Daily
import airportsdata
import time
import threading
import os
warnings.simplefilter(action='ignore', category=FutureWarning)
airports = airportsdata.load('IATA')
all1 = pd.DataFrame()
all2 = pd.DataFrame()
first_mil = pd.read_csv('first_mil.csv')
def weather_from_IATA(airport, time):
"""
uses airport data from "airportsdata" to find latitude and longitude
and pass lat and lon into meteostat, which will return the weather at
that airport at that time.
"""
lon, lat = airports[airport]['lat'], airports[airport]['lon']
# Set time period
start, end = time, time
# Create Point for airport
location = Point(lon, lat)
# Get daily data for location
data = Daily(location, start, end)
weather_time = data.fetch()
# =================================================================
return weather_time
for air in range(1000):
Iata = (first_mil.iloc[air]['ORIGIN'])
flight_time = (first_mil.iloc[air]['FL_DATE'])
weather_data = weather_from_IATA(Iata, flight_time)
all1 = all1.append(weather_data)
end1 = time.time()
print("iteration for append took: ", (end1 - start), "sec")
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论