创建新数据框，其中包含旧数据框中某些列的平均值

发布于 2025-01-13 03:49:42 字数 2082 浏览 3 评论 0原文

我有一个从 csv 文件中提取的数据框。我想迭代一个数据过程，其中只有某些列的数据是 n 行的平均值，而其余列是每次迭代的第一行。

例如，从 csv 中提取的数据由 100 行和 6 列组成。我有一个变量 n_AVE = 6，它告诉代码对每 6 行的数据进行平均。

rawDf = pd.read_csv(outputFilePath / 'Raw_data.csv', encoding='CP932')
OUT: 
       TIME      A       B        C        D        E
0     2021/3/4   148      0       142       0        1      [0]
1     2021/3/5   148      0       142       0        1
2     2021/3/6   150      0       148       0        1
3     2021/3/7   150      0       148       0        1
4     2021/3/8   151      0       148       0        1
5     2021/3/9   151      0       148       0        1
....
91    2021/4/30  195      5       180       0        1      [5]
92    2021/5/1   195      5       180       0        1
93    2021/5/2   195      5       180       0        1
94    2021/5/3   200      5       180       0        1
95    2021/5/4   200      0       200       0        1
96    2021/5/5   200      5       200       0        1      [6]
97    2021/5/6   200      5       200       1        1
98    2021/5/7   200      5       200       1        1
99    2021/5/8   205      5       210       1        1
100   2021/5/9   205      5       210       1        1

只取 [TIME, D, E] 列的第一行
对 [A、B、C] 列中每个 n_AVE (6) 的数据进行平均。
我想创建一个看起来像这样的新数据框

OUT: 
       TIME         A       B         C        D        E
0     2021/3/4    149.66    0        146       0        1
....
5     2021/4/30   197.5   4.166     186.66     0        1
6     2021/5/5    168.33    5        170       0        1

代码是这样的：

for x in range(0,len(rawDf.index), n_AVE): 
    df = pd.DataFrame([rawDf.iloc[[x],0], rawDf.iloc[x:(x + n_AVE),1:3].mean(), rawDf.iloc[x,4:5]])

但是代码不起作用，因为显然当我使用pandas.mean()时，数据框的格式变成这样

df2 = rawDf.iloc[0:6,1:3].mean()
print(df2)

OUT: 
        index      0
    0     A      149.66    
    1     B       0.0
    2     C      146.0      
    [3 rows x 2 columns]

如何使用pandas.mean()而不丢失旧格式？
或者我不应该使用 pandas.mean() 而只是创建我自己的平均代码？

原文

I have a dataframe extracted from a csv file. I want to iterate a data process where only some of the columns's data is the mean of n rows, while the rest of the columns is the first row for each iteration.

For example, the data extracted from the csv consisted of 100 rows and 6 columns.
I have a variable n_AVE = 6, which tells the code to average the data per 6 rows.

rawDf = pd.read_csv(outputFilePath / 'Raw_data.csv', encoding='CP932')
OUT: 
       TIME      A       B        C        D        E
0     2021/3/4   148      0       142       0        1      [0]
1     2021/3/5   148      0       142       0        1
2     2021/3/6   150      0       148       0        1
3     2021/3/7   150      0       148       0        1
4     2021/3/8   151      0       148       0        1
5     2021/3/9   151      0       148       0        1
....
91    2021/4/30  195      5       180       0        1      [5]
92    2021/5/1   195      5       180       0        1
93    2021/5/2   195      5       180       0        1
94    2021/5/3   200      5       180       0        1
95    2021/5/4   200      0       200       0        1
96    2021/5/5   200      5       200       0        1      [6]
97    2021/5/6   200      5       200       1        1
98    2021/5/7   200      5       200       1        1
99    2021/5/8   205      5       210       1        1
100   2021/5/9   205      5       210       1        1

Take only the first row of [TIME, D, E] columns
Average the data per n_AVE (6) from [A, B, C] columns.
I want to create a new dataframe which looks like this

OUT: 
       TIME         A       B         C        D        E
0     2021/3/4    149.66    0        146       0        1
....
5     2021/4/30   197.5   4.166     186.66     0        1
6     2021/5/5    168.33    5        170       0        1

The code is like this:

for x in range(0,len(rawDf.index), n_AVE): 
    df = pd.DataFrame([rawDf.iloc[[x],0], rawDf.iloc[x:(x + n_AVE),1:3].mean(), rawDf.iloc[x,4:5]])

But the code is not working because apparently when I use pandas.mean(), the dataframe's format changed into like this

df2 = rawDf.iloc[0:6,1:3].mean()
print(df2)

OUT: 
        index      0
    0     A      149.66    
    1     B       0.0
    2     C      146.0      
    [3 rows x 2 columns]

How to use pandas.mean() without losing the old format?
Or should I not use pandas.mean() and just create my own averaging code?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

独守阴晴ぅ圆缺 2025-01-20 03:49:43

您可以通过分组器 np.arange(len(df)) // 6 对数据帧进行分组，该分组器每六行对数据帧进行分组，然后使用所需的聚合函数聚合列以获得结果（可选）沿 axis=1 重新索引以对列重新排序

d = {
    'A': 'mean', 'B': 'mean', 'C': 'mean', 
    'TIME': 'first', 'D': 'first', 'E': 'first'
}

df.groupby(np.arange(len(df)) // 6).agg(d).reindex(df.columns, axis=1)

使用列索引定义聚合函数：

d = {
    **dict.fromkeys(df.columns[[0, 4, 5]], 'first'),
    **dict.fromkeys(df.columns[[1, 2, 3]], 'mean' )
}

df.groupby(np.arange(len(df)) // 6).agg(d).reindex(df.columns, axis=1)

结果

        TIME           A         B           C  D  E
0   2021/3/4  149.666667  0.000000  146.000000  0  1
1  2021/4/30  197.500000  4.166667  186.666667  0  1
2   2021/5/6  202.500000  5.000000  205.000000  1  1

You can group the dataframe by the grouper np.arange(len(df)) // 6 which groups the dataframe every six rows, then aggregate the columns using the desired aggregation functions to get the result, optionally reindex along axis=1 to reorder the columns

d = {
    'A': 'mean', 'B': 'mean', 'C': 'mean', 
    'TIME': 'first', 'D': 'first', 'E': 'first'
}

df.groupby(np.arange(len(df)) // 6).agg(d).reindex(df.columns, axis=1)

Define aggegation functions using columns index:

d = {
    **dict.fromkeys(df.columns[[0, 4, 5]], 'first'),
    **dict.fromkeys(df.columns[[1, 2, 3]], 'mean' )
}

df.groupby(np.arange(len(df)) // 6).agg(d).reindex(df.columns, axis=1)

Result

        TIME           A         B           C  D  E
0   2021/3/4  149.666667  0.000000  146.000000  0  1
1  2021/4/30  197.500000  4.166667  186.666667  0  1
2   2021/5/6  202.500000  5.000000  205.000000  1  1

回复收藏 0 原文

~没有更多了~