按行值分开数据框
我想根据第一行分开数据框,以生成四个单独的数据范围(用于亚组分析)。我有一个172x106 excel文件,其中第一行由1、2、3或4组成。其他171行是放射线特征,我想将其复制到''new''数据集。这些列没有名称。我的数据看起来如下:
{0: [4.0, 0.65555056, 0.370511262, 16.5876203, 44.76954415, 48.0, 32.984845, 49.47726751, 49.47726751, 13133.33333, 29.34869973, 0.725907513, 3708.396349, 0.282365204, 13696.0, 2.122884402, 3.039611259, 1419.058749, 1.605529827, 0.488297449], 1: [2.0, 0.82581372, 0.33201741, 20.65753167, 62.21821817, 50.59644256, 62.60990337, 55.56977596, 77.35631842, 23890.66667, 51.38065822, 0.521666786, 7689.706847, 0.321870752, 25152.0, 1.022813615, 1.360453239, 548.2156387, 0.314035581, 0.181204079]}
我想使用groupby
,但是由于列标题没有名称,因此很难实现。这是我当前的代码:
import numpy as np
df = pd.read_excel(r'H:\Documenten\MATLAB\sample_file.xlsx',header=None)
Class_1=df.groupby(df.T.loc[:,0])
df_new = Class_1.get_group("1")
print(df_new)
我遇到的错误是:
Traceback (most recent call last):
File "H:/PycharmProjects/RadiomicsPipeline/spearman_subgroups.py", line 5, in <module>
df_new = Class_1.get_group("1")
File "C:\Users\cpullen\AppData\Roaming\Python\Python37\site-packages\pandas\core\groupby\groupby.py", line 754, in get_group
raise KeyError(name)
KeyError: '1'
如何按行值实现数据框的分离?
I want to split my dataframe based on the first row to generate four separate dataframes (for subgroup analysis). I have a 172x106 Excel file, where the first row consists of either a 1, 2, 3, or 4. The other 171 lines are radiomic features, which I want to copy to the ''new'' dataset. The columns do not have headernames. My data looks like the following:
{0: [4.0, 0.65555056, 0.370511262, 16.5876203, 44.76954415, 48.0, 32.984845, 49.47726751, 49.47726751, 13133.33333, 29.34869973, 0.725907513, 3708.396349, 0.282365204, 13696.0, 2.122884402, 3.039611259, 1419.058749, 1.605529827, 0.488297449], 1: [2.0, 0.82581372, 0.33201741, 20.65753167, 62.21821817, 50.59644256, 62.60990337, 55.56977596, 77.35631842, 23890.66667, 51.38065822, 0.521666786, 7689.706847, 0.321870752, 25152.0, 1.022813615, 1.360453239, 548.2156387, 0.314035581, 0.181204079]}
I wanted to use groupby
, but since the column headers have no name, it makes it hard to implement. This is my current code:
import numpy as np
df = pd.read_excel(r'H:\Documenten\MATLAB\sample_file.xlsx',header=None)
Class_1=df.groupby(df.T.loc[:,0])
df_new = Class_1.get_group("1")
print(df_new)
The error I get is the following:
Traceback (most recent call last):
File "H:/PycharmProjects/RadiomicsPipeline/spearman_subgroups.py", line 5, in <module>
df_new = Class_1.get_group("1")
File "C:\Users\cpullen\AppData\Roaming\Python\Python37\site-packages\pandas\core\groupby\groupby.py", line 754, in get_group
raise KeyError(name)
KeyError: '1'
How do I implement the separation of the dataframes by row values?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不确定这是您想要的结果。如果没有,请清楚地显示所需的输出。
您可以简单地通过转换数据框来实现所需的目标。
df.head()
:(
df_transpose.head
)
list_df [2.0]
,list_df [3.0]
(在此示例中不存在),list_df [4.0]
(顺便说一句。
I am not sure that this is the result you want. If not, please clearly show the desired output.
You can simply achieve what you want by transposing your dataframe.
df.head()
:df_transpose.head()
:list_df
:Individual dataframes:
list_df[1.0]
,list_df[2.0]
,list_df[3.0]
(which does not exist in this example),list_df[4.0]
(BTW. I defined a wrong variable name. The name should be
dict_df
, notlist_df
.)首先,在导入数据框后,按顺序对第一行的值进行排序,
您应该具有带有第1行值的数据框架。之后,您可以使用
df.iloc
重命名您的列名称,然后删除第一行。最终,您可以根据列名进行切片。
First of all, after importing the dataframe, sort the value of the first row in order
You should have a dataframe with ordered 1st row values. After that, you can use
df.iloc
to rename your column name and you will drop the first row.Eventually, you can do slicing based on the column name.