如何迭代数据帧并获取每个组的输出?现在我只得到一行,一组无法识别
我需要根据多个索引(“治疗”、“个人”、“制度”)迭代数据框中的每个数据集。我想使用 x 和 y 对每个治疗、个体和方案应用曲线拟合。目前我只能使用一个索引。
这是数据框
df_tot
Treatment y x individual regime
0 White 21.982733 800 Data20210608 Ctrl
1 White 21.973003 800 Data20210508 Ctrl
2 White 21.968242 800 Data20210408 Ctrl
3 White 21.982733 600 Data20210608 Ctrl
4 White 21.973003 600 Data20210508 Ctrl
5 White 21.968242 600 Data20210408 Ctrl
6 White 21.982733 500 Data20210608 Ctrl
7 White 21.973003 500 Data20210508 Ctrl
5 White 21.968242 500 Data20210408 Ctrl
15 White_FR 22.139293 800 Data20210608 Ctrl
16 White_FR 22.159840 800 Data20210508 Ctrl
17 White_FR 22.162254 800 Data20210408 Ctrl
18 White_FR 22.139293 600 Data20210608 Ctrl
19 White_FR 22.159840 600 Data20210508 Ctrl
20 White_FR 22.162254 600 Data20210408 Ctrl
21 White_FR 22.139293 500 Data20210608 Ctrl
22 White_FR 22.159840 500 Data20210508 Ctrl
23 White_FR 22.162254 500 Data20210408 Ctrl
2500 White 1.864671 800 Data20210708 T
2501 White 1.871709 800 Data20210608 T
2502 White 1.884706 800 Data20210508 T
2503 White 1.872854 600 Data20210708 T
2504 White 1.872233 600 Data20210608 T
2505 White 1.872344 600 Data20210508 T
2506 White 1.872854 500 Data20210708 T
2507 White 1.872233 500 Data20210608 T
2508 White 1.872344 500 Data20210508 T
2519 White_FR 1.882861 800 Data20210708 T
2520 White_FR 1.917002 800 Data20210608 T
2521 White_FR 1.903067 800 Data20210508 T
2519 White_FR 1.882861 600 Data20210708 T
2520 White_FR 1.917002 600 Data20210608 T
2521 White_FR 1.903067 600 Data20210508 T
2519 White_FR 1.882861 500 Data20210708 T
2520 White_FR 1.917002 500 Data20210608 T
2521 White_FR 1.903067 500 Data20210508 T
这是代码:
variables={'Spectrum':Spectrum, date':date, 'regime':regime,
'slope':float}
results = pd.DataFrame(variables, index=[])
group_df = df_tot.groupby(["Spectrum", "date", "regime", "PPFD",
"start"])
def model(x, slope):
return (slope*x) + start
group_df.apply(lambda x : curve_fit(model, x.loc[:, 'PPFD'],
x.loc[:, 'Photo']))
new_row = {'Spectrum': Spectrum, date':date, 'regime':regime, 'slope':
popt[0]} ## adding Spectrum gives an error
#name 'Spectrum' is not defined
results=results.append(new_row, ignore_index=True)
现在我明白了
results
date regime slope
0 Data20210608 Ctrl 0.05
I need to iterate through each dataset in the dataframe based on multiple indexes ('Treatment', 'individual', 'regime'). I want to apply curve fit using x and y for each Treatment, individual and regime. Currently I am able to use only one index.
This is the dataframe
df_tot
Treatment y x individual regime
0 White 21.982733 800 Data20210608 Ctrl
1 White 21.973003 800 Data20210508 Ctrl
2 White 21.968242 800 Data20210408 Ctrl
3 White 21.982733 600 Data20210608 Ctrl
4 White 21.973003 600 Data20210508 Ctrl
5 White 21.968242 600 Data20210408 Ctrl
6 White 21.982733 500 Data20210608 Ctrl
7 White 21.973003 500 Data20210508 Ctrl
5 White 21.968242 500 Data20210408 Ctrl
15 White_FR 22.139293 800 Data20210608 Ctrl
16 White_FR 22.159840 800 Data20210508 Ctrl
17 White_FR 22.162254 800 Data20210408 Ctrl
18 White_FR 22.139293 600 Data20210608 Ctrl
19 White_FR 22.159840 600 Data20210508 Ctrl
20 White_FR 22.162254 600 Data20210408 Ctrl
21 White_FR 22.139293 500 Data20210608 Ctrl
22 White_FR 22.159840 500 Data20210508 Ctrl
23 White_FR 22.162254 500 Data20210408 Ctrl
2500 White 1.864671 800 Data20210708 T
2501 White 1.871709 800 Data20210608 T
2502 White 1.884706 800 Data20210508 T
2503 White 1.872854 600 Data20210708 T
2504 White 1.872233 600 Data20210608 T
2505 White 1.872344 600 Data20210508 T
2506 White 1.872854 500 Data20210708 T
2507 White 1.872233 500 Data20210608 T
2508 White 1.872344 500 Data20210508 T
2519 White_FR 1.882861 800 Data20210708 T
2520 White_FR 1.917002 800 Data20210608 T
2521 White_FR 1.903067 800 Data20210508 T
2519 White_FR 1.882861 600 Data20210708 T
2520 White_FR 1.917002 600 Data20210608 T
2521 White_FR 1.903067 600 Data20210508 T
2519 White_FR 1.882861 500 Data20210708 T
2520 White_FR 1.917002 500 Data20210608 T
2521 White_FR 1.903067 500 Data20210508 T
This is the code:
variables={'Spectrum':Spectrum, date':date, 'regime':regime,
'slope':float}
results = pd.DataFrame(variables, index=[])
group_df = df_tot.groupby(["Spectrum", "date", "regime", "PPFD",
"start"])
def model(x, slope):
return (slope*x) + start
group_df.apply(lambda x : curve_fit(model, x.loc[:, 'PPFD'],
x.loc[:, 'Photo']))
new_row = {'Spectrum': Spectrum, date':date, 'regime':regime, 'slope':
popt[0]} ## adding Spectrum gives an error
#name 'Spectrum' is not defined
results=results.append(new_row, ignore_index=True)
Now I get
results
date regime slope
0 Data20210608 Ctrl 0.05
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您绝对可以迭代具有超过 1 个索引的数据帧。
首先,您的代码存在一些主要问题:
del
删除数据框中的某些列,使用 drop 或使用loc
或iloc
选择除一列之外的所有列。all= [df_Ctrl, df_FR]
,all
在 python 中有指定的含义,你应该选择其他名称。for g in all: #if I put for key, g in all
,all
这里是两个元素的列表,这里没有什么可以解压的[[]]
选择数据帧的子数据帧,而是使用 loc 或 iloc< /a> 相反。如果我正确理解您的问题,您希望根据三个数据对数据帧的元素进行分组:“治疗”、“个人”、“制度”,然后对于每个分组值,您希望对 x 和 y 执行指定的操作。您可以适应这一点:
显然,由于您没有提供模型或 curve_fit,我无法测试它是否正确。但主要的想法就在这里,你可以根据它进行工作。
You can absolutely iterate through a dataframe with more than 1 index.
First of all, there are some major issues with your code :
del
to delete some columns in a dataframe, use drop or select all but one usingloc
oriloc
.all= [df_Ctrl, df_FR]
,all
has a specified meaning in python, you should pick an other name.for g in all: #if I put for key, g in all
,all
here a list of two elements, there is nothing to unpack here[[]]
to select a sub dataframe of a dataframe, but using loc or iloc instead.If I understand your problem correctly, you want to group elements of your dataframe depending of three data : 'Treatment', 'individual', 'regime', then for each grouped values, you want to perform a specified operation on x and y. You can adapt for this :
Obviously since you didn't provide model nor curve_fit, I can't test if it's correct or not. But the main idea is here and you can work from it.