使用行作为数组的 Pandas t 检验
我需要找到一种方法来计算两组数据的 p 值,将一个 DataFrame 中的每一行与另一个 DataFrame 中的伴随行进行比较。例如,array1 将是第 300 行中的五个项目(不包括 stdev 和 Ctrl 平均值),对于 array2 来说也是如此,即第 300 行中的五个项目。
df1:
Pep Ctrl 1 Pep Ctrl 2 Pep Ctrl 3 Pep Ctrl 4 Pep Ctrl 5 stdev Ctrl average
300 47591000.0 NaN 49576000.0 41288000.0 61727000.0 8.551730e+06 4.174675e+07
301 4305900.0 2670800.0 NaN NaN 7338400.0 2.368407e+06 4.170877e+06
302 11466000.0 3799400.0 NaN 18552000.0 31661000.0 1.184124e+07 1.546393e+07
303 11255000.0 5402300.0 18337000.0 19706000.0 40286000.0 1.321849e+07 1.803413e+07
df2:
MCI 1 vs Ctrl normalized MCI 2 vs Ctrl normalized MCI 3 vs Ctrl normalized MCI 4 vs Ctrl normalized MCI 5 vs Ctrl normalized stdev
300 1.054045e+08 4.980206e+07 4.764870e+07 1.834201e+07 2.994124e+07 3.346473e+07
301 1.019931e+07 3.309509e+06 6.595145e+06 1.089385e+07 NaN 3.508776e+06
302 3.288333e+07 6.953062e+06 1.430190e+07 4.988915e+06 2.310888e+07 1.162495e+07
303 3.332308e+07 1.682790e+07 2.951138e+07 9.474570e+06 2.965893e+07 1.014219e+07
我需要进行具有相等方差的双尾 t 检验,然后将其添加为最后一列。或者,如果 SciPy 可以选择仅输入项目数、标准差和平均值,这也可以。
这就是我尝试过的:
group1 = [df1['Pep Ctrl 1'],df1['Pep Ctrl 2'],df1['Pep Ctrl 3'],df1['Pep Ctrl 4'],df1['Pep Ctrl 5']]
group2 = [df2['MCI 1 vs Ctrl normalized'], df2['MCI 2 vs Ctrl normalized'], df2['MCI 3 vs Ctrl normalized'], df2['MCI 4 vs Ctrl normalized'], df2['MCI 5 vs Ctrl normalized']]
ttest = stats.ttest_ind(a=group1,b=group2,axis = 1, equal_var = True)
任何帮助将不胜感激。
df1
构造函数:
{'Pep Ctrl 1': [47591000.0, 4305900.0, 11466000.0, 11255000.0],
'Pep Ctrl 2': [nan, 2670800.0, 3799400.0, 5402300.0],
'Pep Ctrl 3': [49576000.0, nan, nan, 18337000.0],
'Pep Ctrl 4': [41288000.0, nan, 18552000.0, 19706000.0],
'Pep Ctrl 5': [61727000.0, 7338400.0, 31661000.0, 40286000.0],
'stdev': [8551730.0, 2368407.0, 11841240.0, 13218490.0],
'Ctrl average': [41746750.0, 4170877.0, 15463930.0, 18034130.0]}
df2
构造函数:
{'MCI 1 vs Ctrl normalized': [105404500.0, 10199310.0, 32883330.0, 33323080.0],
'MCI 2 vs Ctrl normalized': [49802060.0, 3309509.0, 6953062.0, 16827900.0],
'MCI 3 vs Ctrl normalized': [47648700.0, 6595145.0, 14301900.0, 29511380.0],
'MCI 4 vs Ctrl normalized': [18342010.0, 10893850.0, 4988915.0, 9474570.0],
'MCI 5 vs Ctrl normalized': [29941240.0, nan, 23108880.0, 29658930.0],
'stdev': [33464730.0, 3508776.0, 11624950.0, 10142190.0]}
I need to find a way to calculate a p-value for two sets of data, comparing each row in one DataFrame with the accompanying row in another DataFrame. For example, array1 would be the five items in row 300 (not including stdev and Ctrl average), and same for array2 with the five items in row 300.
df1:
Pep Ctrl 1 Pep Ctrl 2 Pep Ctrl 3 Pep Ctrl 4 Pep Ctrl 5 stdev Ctrl average
300 47591000.0 NaN 49576000.0 41288000.0 61727000.0 8.551730e+06 4.174675e+07
301 4305900.0 2670800.0 NaN NaN 7338400.0 2.368407e+06 4.170877e+06
302 11466000.0 3799400.0 NaN 18552000.0 31661000.0 1.184124e+07 1.546393e+07
303 11255000.0 5402300.0 18337000.0 19706000.0 40286000.0 1.321849e+07 1.803413e+07
df2:
MCI 1 vs Ctrl normalized MCI 2 vs Ctrl normalized MCI 3 vs Ctrl normalized MCI 4 vs Ctrl normalized MCI 5 vs Ctrl normalized stdev
300 1.054045e+08 4.980206e+07 4.764870e+07 1.834201e+07 2.994124e+07 3.346473e+07
301 1.019931e+07 3.309509e+06 6.595145e+06 1.089385e+07 NaN 3.508776e+06
302 3.288333e+07 6.953062e+06 1.430190e+07 4.988915e+06 2.310888e+07 1.162495e+07
303 3.332308e+07 1.682790e+07 2.951138e+07 9.474570e+06 2.965893e+07 1.014219e+07
I need to do a two-tailed t test with equal variances, and then add this as the last column. Alternatively, if SciPy has an option to just input the number of items, standard deviation, and average, this could also work.
This is what I tried:
group1 = [df1['Pep Ctrl 1'],df1['Pep Ctrl 2'],df1['Pep Ctrl 3'],df1['Pep Ctrl 4'],df1['Pep Ctrl 5']]
group2 = [df2['MCI 1 vs Ctrl normalized'], df2['MCI 2 vs Ctrl normalized'], df2['MCI 3 vs Ctrl normalized'], df2['MCI 4 vs Ctrl normalized'], df2['MCI 5 vs Ctrl normalized']]
ttest = stats.ttest_ind(a=group1,b=group2,axis = 1, equal_var = True)
Any help would be appreciated.
df1
constructor:
{'Pep Ctrl 1': [47591000.0, 4305900.0, 11466000.0, 11255000.0],
'Pep Ctrl 2': [nan, 2670800.0, 3799400.0, 5402300.0],
'Pep Ctrl 3': [49576000.0, nan, nan, 18337000.0],
'Pep Ctrl 4': [41288000.0, nan, 18552000.0, 19706000.0],
'Pep Ctrl 5': [61727000.0, 7338400.0, 31661000.0, 40286000.0],
'stdev': [8551730.0, 2368407.0, 11841240.0, 13218490.0],
'Ctrl average': [41746750.0, 4170877.0, 15463930.0, 18034130.0]}
df2
constructor:
{'MCI 1 vs Ctrl normalized': [105404500.0, 10199310.0, 32883330.0, 33323080.0],
'MCI 2 vs Ctrl normalized': [49802060.0, 3309509.0, 6953062.0, 16827900.0],
'MCI 3 vs Ctrl normalized': [47648700.0, 6595145.0, 14301900.0, 29511380.0],
'MCI 4 vs Ctrl normalized': [18342010.0, 10893850.0, 4988915.0, 9474570.0],
'MCI 5 vs Ctrl normalized': [29941240.0, nan, 23108880.0, 29658930.0],
'stdev': [33464730.0, 3508776.0, 11624950.0, 10142190.0]}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用
iterrows
迭代df1
并将每一行与具有相同索引的df2
中的相应行进行比较:输出:
You could use
iterrows
to iterate overdf1
and compare each row with a corresponding row indf2
with the same index:Output: