加快循环的漫长速度
我有一个用于循环的循环,该循环中包含索引列表的字典。例如; {0:[1,5,...],1:[2,4,...],...}。我需要迭代此操作,并使用索引值根据巨大的数据框(约8000万行)生成新的词典。
正如您可以通过下面的代码看到的那样,我有了一种可以做到这一点的方法,但是,这样做需要花费大量时间。我希望有一种方法可以通过在多线程或其他方式中执行操作来加快此问题。我已经看到了其他类似的问题,这些问题暗示在Cpython中写这篇文章,但是,由于我正在使用数据框架,所以我看不出这是怎么可能的。
total_list = []
for ind in indexes_per_group:
ship_list = []
dataframe_indexes = indexes_per_group[ind]
for index in dataframe_indexes:
singleLocation_dict = {}
singleLocation_dict['lat'] = df.loc[index]['LATITUDE']
singleLocation_dict['lng'] = df.loc[index]['LONGITUDE']
ship_list.append(singleLocation_dict)
total_list.append(ship_list)
非常感谢您对此的任何帮助,将不胜感激
:我正在循环的字典来自Pandas Groupby功能
I have a for loop which loops through a dictionary containing lists of indexes. For example; {0: [1,5,...], 1: [2,4,...], ...}. I need to iterate through this and use the index values to produce new dictionaries based on an enormous dataframe (around 80 million rows).
As you can see by the code below I have a method which does work to do this, however, it takes an incredible amount of time to do it. I am hoping there is a way to speed this up by performing the operation in sections multithreaded or otherwise. I have seen other similar questions which suggest writing this in cpython, however, because I am working with a dataframe I dont see how this could be possible.
total_list = []
for ind in indexes_per_group:
ship_list = []
dataframe_indexes = indexes_per_group[ind]
for index in dataframe_indexes:
singleLocation_dict = {}
singleLocation_dict['lat'] = df.loc[index]['LATITUDE']
singleLocation_dict['lng'] = df.loc[index]['LONGITUDE']
ship_list.append(singleLocation_dict)
total_list.append(ship_list)
Many thanks for any help on this it will be greatly appreciated
Edit: The dictionary I am looping through comes from pandas groupby function
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论