当前位置：文江博客话题详情

Python Pandas 分段错误 - 将列求和在一起

发布于 2025-01-17 05:59:01 字数 2894 浏览 1 评论 0原文

我正在开展一个日常幻想运动项目。

我有一个数据框，其中包含可能的阵容（6 列，阵容中的每个球员 1 列）。

作为我的流程的一部分，我为所有玩家生成一个可能的幻想点值。

接下来，我想通过引用幻想得分数据框来计算我的阵容数据框中阵容的总得分。

供参考：

阵容数据框：列 = F1、F2、F3、F4、F5、F6，其中每列是玩家姓名 + '_' + 他们的玩家 id
幻想点数据框：列 = 玩家 + ID、幻想点

I go 列6 名玩家获得 6 个幻想点值的列：

for col in ['F1', 'F2', 'F3', 'F4', 'F5', 'F6']:
    lineups = lineups.join(sim_data[['Name_SlateID', 'Points']].set_index('Name_SlateID'), how='left', on=f'{col}', rsuffix = 'x')

然后，在我认为最简单的部分中，我尝试总结它们，然后得到 Segmentation Failure: 11

sum_columns = ['F1_points', 'F2_points', 'F3_points', 'F4_points', 'F5_points', 'F6_points']

lineups = reduce_memory_usage(lineups)

lineups[f'sim_{i}_points'] = lineups[sum_columns].sum(axis=1, skipna=True)

reduce_memory_usage comes来自这篇文章： https://towardsdatascience.com/6-pandas-mistakes-that-silently-tell-you-are-a-rookie-b566a252e60d

在运行此命令之前，我已将数据帧的内存减少了 50%通过选择正确的数据类型，我尝试使用 pd.eval() 代替，我尝试通过 for 循环对列进行一一求和，但似乎没有任何效果。

非常感谢任何帮助！

编辑：规格：操作系统 - MacOS Monterey 12.2.1、python - 3.8.8、pandas - 1.4.1

以下是导致错误的行之前我的阵容数据帧的详细信息：

Data columns (total 27 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   F1                  107056 non-null  object 
 1   F2                  107056 non-null  object 
 2   F3                  107056 non-null  object 
 3   F4                  107056 non-null  object 
 4   F5                  107056 non-null  object 
 5   F6                  107056 non-null  object 
 6   F1_own              107056 non-null  float16
 7   F1_salary           107056 non-null  int16  
 8   F2_own              107056 non-null  float16
 9   F2_salary           107056 non-null  int16  
 10  F3_own              107056 non-null  float16
 11  F3_salary           107056 non-null  int16  
 12  F4_own              107056 non-null  float16
 13  F4_salary           107056 non-null  int16  
 14  F5_own              107056 non-null  float16
 15  F5_salary           107056 non-null  int16  
 16  F6_own              107056 non-null  float16
 17  F6_salary           107056 non-null  int16  
 18  total_salary        107056 non-null  int32  
 19  dupes               107056 non-null  float32
 20  over_600_frequency  107056 non-null  int8   
 21  F1_points           107056 non-null  float16
 22  F2_points           107056 non-null  float16
 23  F3_points           107056 non-null  float16
 24  F4_points           107056 non-null  float16
 25  F5_points           107056 non-null  float16
 26  F6_points           107056 non-null  float16
dtypes: float16(12), float32(1), int16(6), int32(1), int8(1), object(6)
memory usage: 10.3+ MB

原文

I am working on a project for daily fantasy sports.

I have a dataframe containing possible lineups in it (6 columns, 1 for each player in a lineup).

As part of my process, I generate a possible fantasy point value for all players.

Next, I want to total the points scored for a lineup in my lineups dataframe by referencing the fantasy points dataframe.

For reference:

Lineups Dataframe: columns = F1, F2, F3, F4, F5, F6 where each column is a player's name + '_' + their player id
Fantasy Points Dataframe: columns = Player + ID, Fantasy Points

I go column by column for the 6 players to get the 6 fantasy points values:

for col in ['F1', 'F2', 'F3', 'F4', 'F5', 'F6']:
    lineups = lineups.join(sim_data[['Name_SlateID', 'Points']].set_index('Name_SlateID'), how='left', on=f'{col}', rsuffix = 'x')

Then, in what I thought would be the simplest part, I try to sum them up and I get Segmentation Fault: 11

sum_columns = ['F1_points', 'F2_points', 'F3_points', 'F4_points', 'F5_points', 'F6_points']

lineups = reduce_memory_usage(lineups)

lineups[f'sim_{i}_points'] = lineups[sum_columns].sum(axis=1, skipna=True)

reduce_memory_usage comes from this article: https://towardsdatascience.com/6-pandas-mistakes-that-silently-tell-you-are-a-rookie-b566a252e60d

I have reduced the memory of the dataframe by 50% before running this line by choosing correct dtypes, I have tried using pd.eval() instead, I have tried summing the columns one by one via a for loop and nothing ever seems to work.

Any help is greatly appreciated!

Edit:
Specs: OS - MacOS Monterey 12.2.1, python - 3.8.8, pandas - 1.4.1

Here are the details of my lineups dataframe right before the line causing the error:

Data columns (total 27 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   F1                  107056 non-null  object 
 1   F2                  107056 non-null  object 
 2   F3                  107056 non-null  object 
 3   F4                  107056 non-null  object 
 4   F5                  107056 non-null  object 
 5   F6                  107056 non-null  object 
 6   F1_own              107056 non-null  float16
 7   F1_salary           107056 non-null  int16  
 8   F2_own              107056 non-null  float16
 9   F2_salary           107056 non-null  int16  
 10  F3_own              107056 non-null  float16
 11  F3_salary           107056 non-null  int16  
 12  F4_own              107056 non-null  float16
 13  F4_salary           107056 non-null  int16  
 14  F5_own              107056 non-null  float16
 15  F5_salary           107056 non-null  int16  
 16  F6_own              107056 non-null  float16
 17  F6_salary           107056 non-null  int16  
 18  total_salary        107056 non-null  int32  
 19  dupes               107056 non-null  float32
 20  over_600_frequency  107056 non-null  int8   
 21  F1_points           107056 non-null  float16
 22  F2_points           107056 non-null  float16
 23  F3_points           107056 non-null  float16
 24  F4_points           107056 non-null  float16
 25  F5_points           107056 non-null  float16
 26  F6_points           107056 non-null  float16
dtypes: float16(12), float32(1), int16(6), int32(1), int8(1), object(6)
memory usage: 10.3+ MB

分享到QQ

分享到微博