按行值分开数据框

发布于 2025-02-14 01:30:36 字数 1307 浏览 3 评论 0原文

我想根据第一行分开数据框,以生成四个单独的数据范围(用于亚组分析)。我有一个172x106 excel文件,其中第一行由1、2、3或4组成。其他171行是放射线特征,我想将其复制到''new''数据集。这些列没有名称。我的数据看起来如下:

{0: [4.0, 0.65555056, 0.370511262, 16.5876203, 44.76954415, 48.0, 32.984845, 49.47726751, 49.47726751, 13133.33333, 29.34869973, 0.725907513, 3708.396349, 0.282365204, 13696.0, 2.122884402, 3.039611259, 1419.058749, 1.605529827, 0.488297449], 1: [2.0, 0.82581372, 0.33201741, 20.65753167, 62.21821817, 50.59644256, 62.60990337, 55.56977596, 77.35631842, 23890.66667, 51.38065822, 0.521666786, 7689.706847, 0.321870752, 25152.0, 1.022813615, 1.360453239, 548.2156387, 0.314035581, 0.181204079]}


我想使用groupby,但是由于列标题没有名称,因此很难实现。这是我当前的代码:

import numpy as np
df = pd.read_excel(r'H:\Documenten\MATLAB\sample_file.xlsx',header=None)
Class_1=df.groupby(df.T.loc[:,0])
df_new = Class_1.get_group("1")
print(df_new) 

我遇到的错误是:

Traceback (most recent call last):
  File "H:/PycharmProjects/RadiomicsPipeline/spearman_subgroups.py", line 5, in <module>
    df_new = Class_1.get_group("1")
  File "C:\Users\cpullen\AppData\Roaming\Python\Python37\site-packages\pandas\core\groupby\groupby.py", line 754, in get_group
    raise KeyError(name)
KeyError: '1'

如何按行值实现数据框的分离?

I want to split my dataframe based on the first row to generate four separate dataframes (for subgroup analysis). I have a 172x106 Excel file, where the first row consists of either a 1, 2, 3, or 4. The other 171 lines are radiomic features, which I want to copy to the ''new'' dataset. The columns do not have headernames. My data looks like the following:

{0: [4.0, 0.65555056, 0.370511262, 16.5876203, 44.76954415, 48.0, 32.984845, 49.47726751, 49.47726751, 13133.33333, 29.34869973, 0.725907513, 3708.396349, 0.282365204, 13696.0, 2.122884402, 3.039611259, 1419.058749, 1.605529827, 0.488297449], 1: [2.0, 0.82581372, 0.33201741, 20.65753167, 62.21821817, 50.59644256, 62.60990337, 55.56977596, 77.35631842, 23890.66667, 51.38065822, 0.521666786, 7689.706847, 0.321870752, 25152.0, 1.022813615, 1.360453239, 548.2156387, 0.314035581, 0.181204079]}

I wanted to use groupby, but since the column headers have no name, it makes it hard to implement. This is my current code:

import numpy as np
df = pd.read_excel(r'H:\Documenten\MATLAB\sample_file.xlsx',header=None)
Class_1=df.groupby(df.T.loc[:,0])
df_new = Class_1.get_group("1")
print(df_new) 

The error I get is the following:

Traceback (most recent call last):
  File "H:/PycharmProjects/RadiomicsPipeline/spearman_subgroups.py", line 5, in <module>
    df_new = Class_1.get_group("1")
  File "C:\Users\cpullen\AppData\Roaming\Python\Python37\site-packages\pandas\core\groupby\groupby.py", line 754, in get_group
    raise KeyError(name)
KeyError: '1'

How do I implement the separation of the dataframes by row values?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

木有鱼丸 2025-02-21 01:30:36

我不确定这是您想要的结果。如果没有,请清楚地显示所需的输出。

您可以简单地通过转换数据框来实现所需的目标。

import numpy as np
import pandas as pd

np.random.seed(0)

n = 5
data = np.stack((np.random.choice([1, 2, 3, 4], n), np.random.rand(n), np.random.rand(n), np.random.rand(n)), axis=0)
df = pd.DataFrame(data)

df.head()

0   1   2   3   4
0   1.000000    4.000000    2.000000    1.000000    4.000000
1   0.857946    0.847252    0.623564    0.384382    0.297535
2   0.056713    0.272656    0.477665    0.812169    0.479977
3   0.392785    0.836079    0.337396    0.648172    0.368242
df_transpose = df.transpose()
df_transpose.columns = ['group'] + list(df_transpose.columns.values[1:])

group   1   2   3
0   1.0 0.857946    0.056713    0.392785
1   4.0 0.847252    0.272656    0.836079
2   2.0 0.623564    0.477665    0.337396
3   1.0 0.384382    0.812169    0.648172
4   4.0 0.297535    0.479977    0.368242
list_df = dict()

for group in set(df_transpose['group']):
    list_df[group] = df_transpose.loc[df_transpose['group'] == group]

df_transpose.head

{
 1.0:
     group         1         2         3
     0    1.0  0.857946  0.056713  0.392785
     3    1.0  0.384382  0.812169  0.648172,
 2.0:
     group         1         2         3
     2    2.0  0.623564  0.477665  0.337396,
 4.0:
     group         1         2         3
     1    4.0  0.847252  0.272656  0.836079
     4    4.0  0.297535  0.479977  0.368242
}

list_df [2.0]list_df [3.0](在此示例中不存在), list_df [4.0]

(顺便说一句。

I am not sure that this is the result you want. If not, please clearly show the desired output.

You can simply achieve what you want by transposing your dataframe.

import numpy as np
import pandas as pd

np.random.seed(0)

n = 5
data = np.stack((np.random.choice([1, 2, 3, 4], n), np.random.rand(n), np.random.rand(n), np.random.rand(n)), axis=0)
df = pd.DataFrame(data)

df.head():

0   1   2   3   4
0   1.000000    4.000000    2.000000    1.000000    4.000000
1   0.857946    0.847252    0.623564    0.384382    0.297535
2   0.056713    0.272656    0.477665    0.812169    0.479977
3   0.392785    0.836079    0.337396    0.648172    0.368242
df_transpose = df.transpose()
df_transpose.columns = ['group'] + list(df_transpose.columns.values[1:])

df_transpose.head():

group   1   2   3
0   1.0 0.857946    0.056713    0.392785
1   4.0 0.847252    0.272656    0.836079
2   2.0 0.623564    0.477665    0.337396
3   1.0 0.384382    0.812169    0.648172
4   4.0 0.297535    0.479977    0.368242
list_df = dict()

for group in set(df_transpose['group']):
    list_df[group] = df_transpose.loc[df_transpose['group'] == group]

list_df:

{
 1.0:
     group         1         2         3
     0    1.0  0.857946  0.056713  0.392785
     3    1.0  0.384382  0.812169  0.648172,
 2.0:
     group         1         2         3
     2    2.0  0.623564  0.477665  0.337396,
 4.0:
     group         1         2         3
     1    4.0  0.847252  0.272656  0.836079
     4    4.0  0.297535  0.479977  0.368242
}

Individual dataframes:

list_df[1.0], list_df[2.0], list_df[3.0] (which does not exist in this example), list_df[4.0]

(BTW. I defined a wrong variable name. The name should be dict_df, not list_df.)

流星番茄 2025-02-21 01:30:36

首先,在导入数据框后,按顺序对第一行的值进行排序,

df
Out[26]: 
   0  1  2  3
0  0  1  0  1
1  1  2  5  8
2  2  3  6  9
3  3  4  7  0

df = df.sort_values(by = 0, axis = 1)

Out[30]:
   0  2  1  3
0  0  0  1  1
1  1  5  2  8
2  2  6  3  9
3  3  7  4  0

您应该具有带有第1行值的数据框架。之后,您可以使用df.iloc重命名您的列名称,然后删除第一行。

df.columns = df.iloc[0]

0  0  0  1  1
0  0  0  1  1
1  1  5  2  8
2  2  6  3  9
3  3  7  4  0

df.drop(0, inplace=True)

df.drop(0)
Out[51]: 
0  0  0  1  1
1  1  5  2  8
2  2  6  3  9
3  3  7  4  0

最终,您可以根据列名进行切片。

df_1 = df[1]

df_1
Out[56]: 
0  1  1
1  2  8
2  3  9
3  4  0

First of all, after importing the dataframe, sort the value of the first row in order

df
Out[26]: 
   0  1  2  3
0  0  1  0  1
1  1  2  5  8
2  2  3  6  9
3  3  4  7  0

df = df.sort_values(by = 0, axis = 1)

Out[30]:
   0  2  1  3
0  0  0  1  1
1  1  5  2  8
2  2  6  3  9
3  3  7  4  0

You should have a dataframe with ordered 1st row values. After that, you can use df.iloc to rename your column name and you will drop the first row.

df.columns = df.iloc[0]

0  0  0  1  1
0  0  0  1  1
1  1  5  2  8
2  2  6  3  9
3  3  7  4  0

df.drop(0, inplace=True)

df.drop(0)
Out[51]: 
0  0  0  1  1
1  1  5  2  8
2  2  6  3  9
3  3  7  4  0

Eventually, you can do slicing based on the column name.

df_1 = df[1]

df_1
Out[56]: 
0  1  1
1  2  8
2  3  9
3  4  0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文