如何有效地迭代具有数百万行的 pandas 数据集并将函数传递给每一行?

发布于 2025-01-16 02:07:02 字数 1464 浏览 2 评论 0原文

我有一个 pandas 数据框,其中包含 700 万个飞行数据实例。航班数据带有我用来获取当时天气的位置和时间。目前,对于 1000 个实例,我的代码花费了 83 秒。考虑到我要经历 700 万,这太长了。我不知道该怎么做。我读到itertuples会更快一点,或者可能使用Dask多线程

import warnings
import pandas as pd
from datetime import datetime
from meteostat import Point, Daily
import airportsdata
import time
import threading
import os

warnings.simplefilter(action='ignore', category=FutureWarning)

airports = airportsdata.load('IATA')
all1 = pd.DataFrame()
all2 = pd.DataFrame()

first_mil = pd.read_csv('first_mil.csv')

def weather_from_IATA(airport, time):
    """
    uses airport data from "airportsdata" to find latitude and longitude 
    and pass lat and lon into meteostat, which will return the weather at
    that airport at that time.
    """
    lon, lat = airports[airport]['lat'], airports[airport]['lon']

    # Set time period
    start, end = time, time

    # Create Point for airport
    location = Point(lon, lat)

    # Get daily data for location
    data = Daily(location, start, end)
    weather_time = data.fetch()

    # =================================================================
    return weather_time

for air in range(1000):
    Iata = (first_mil.iloc[air]['ORIGIN'])
    flight_time = (first_mil.iloc[air]['FL_DATE'])
    weather_data = weather_from_IATA(Iata, flight_time)
    all1 = all1.append(weather_data)

end1 = time.time()
print("iteration for append took: ", (end1 - start), "sec")


I have a pandas dataframe with 7 million instances of flight data. the flight data comes with the location and the time which I am using to pull weather for that time. Right now for 1000 instances, my code is taking 83 seconds. This is way too long considering I got 7 million to go through. I am not sure how to go about this. I've read that itertuples would be a bit quicker or maybe using Dask or multitreading.

import warnings
import pandas as pd
from datetime import datetime
from meteostat import Point, Daily
import airportsdata
import time
import threading
import os

warnings.simplefilter(action='ignore', category=FutureWarning)

airports = airportsdata.load('IATA')
all1 = pd.DataFrame()
all2 = pd.DataFrame()

first_mil = pd.read_csv('first_mil.csv')

def weather_from_IATA(airport, time):
    """
    uses airport data from "airportsdata" to find latitude and longitude 
    and pass lat and lon into meteostat, which will return the weather at
    that airport at that time.
    """
    lon, lat = airports[airport]['lat'], airports[airport]['lon']

    # Set time period
    start, end = time, time

    # Create Point for airport
    location = Point(lon, lat)

    # Get daily data for location
    data = Daily(location, start, end)
    weather_time = data.fetch()

    # =================================================================
    return weather_time

for air in range(1000):
    Iata = (first_mil.iloc[air]['ORIGIN'])
    flight_time = (first_mil.iloc[air]['FL_DATE'])
    weather_data = weather_from_IATA(Iata, flight_time)
    all1 = all1.append(weather_data)

end1 = time.time()
print("iteration for append took: ", (end1 - start), "sec")


如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文