加快循环的漫长速度

发布于 2025-02-10 08:56:32 字数 744 浏览 0 评论 0原文

我有一个用于循环的循环，该循环中包含索引列表的字典。例如; {0：[1,5，...]，1：[2,4，...]，...}。我需要迭代此操作，并使用索引值根据巨大的数据框（约8000万行）生成新的词典。

正如您可以通过下面的代码看到的那样，我有了一种可以做到这一点的方法，但是，这样做需要花费大量时间。我希望有一种方法可以通过在多线程或其他方式中执行操作来加快此问题。我已经看到了其他类似的问题，这些问题暗示在Cpython中写这篇文章，但是，由于我正在使用数据框架，所以我看不出这是怎么可能的。

total_list = []
for ind in indexes_per_group:
    ship_list = []

    dataframe_indexes = indexes_per_group[ind]
    for index in dataframe_indexes:
        singleLocation_dict = {}

        singleLocation_dict['lat'] = df.loc[index]['LATITUDE']
        singleLocation_dict['lng'] = df.loc[index]['LONGITUDE']
        
        ship_list.append(singleLocation_dict)
    total_list.append(ship_list)

非常感谢您对此的任何帮助，将不胜感激

：我正在循环的字典来自Pandas Groupby功能

原文

I have a for loop which loops through a dictionary containing lists of indexes. For example; {0: [1,5,...], 1: [2,4,...], ...}. I need to iterate through this and use the index values to produce new dictionaries based on an enormous dataframe (around 80 million rows).

As you can see by the code below I have a method which does work to do this, however, it takes an incredible amount of time to do it. I am hoping there is a way to speed this up by performing the operation in sections multithreaded or otherwise. I have seen other similar questions which suggest writing this in cpython, however, because I am working with a dataframe I dont see how this could be possible.

total_list = []
for ind in indexes_per_group:
    ship_list = []

    dataframe_indexes = indexes_per_group[ind]
    for index in dataframe_indexes:
        singleLocation_dict = {}

        singleLocation_dict['lat'] = df.loc[index]['LATITUDE']
        singleLocation_dict['lng'] = df.loc[index]['LONGITUDE']
        
        ship_list.append(singleLocation_dict)
    total_list.append(ship_list)

Many thanks for any help on this it will be greatly appreciated

Edit: The dictionary I am looping through comes from pandas groupby function

分享到QQ

分享到微博