按行值分开数据框

发布于 2025-02-14 01:30:36 字数 1307 浏览 3 评论 0原文

我想根据第一行分开数据框，以生成四个单独的数据范围（用于亚组分析）。我有一个172x106 excel文件，其中第一行由1、2、3或4组成。其他171行是放射线特征，我想将其复制到''new''数据集。这些列没有名称。我的数据看起来如下：

{0: [4.0, 0.65555056, 0.370511262, 16.5876203, 44.76954415, 48.0, 32.984845, 49.47726751, 49.47726751, 13133.33333, 29.34869973, 0.725907513, 3708.396349, 0.282365204, 13696.0, 2.122884402, 3.039611259, 1419.058749, 1.605529827, 0.488297449], 1: [2.0, 0.82581372, 0.33201741, 20.65753167, 62.21821817, 50.59644256, 62.60990337, 55.56977596, 77.35631842, 23890.66667, 51.38065822, 0.521666786, 7689.706847, 0.321870752, 25152.0, 1.022813615, 1.360453239, 548.2156387, 0.314035581, 0.181204079]}

我想使用groupby，但是由于列标题没有名称，因此很难实现。这是我当前的代码：

import numpy as np
df = pd.read_excel(r'H:\Documenten\MATLAB\sample_file.xlsx',header=None)
Class_1=df.groupby(df.T.loc[:,0])
df_new = Class_1.get_group("1")
print(df_new)

我遇到的错误是：

Traceback (most recent call last):
  File "H:/PycharmProjects/RadiomicsPipeline/spearman_subgroups.py", line 5, in <module>
    df_new = Class_1.get_group("1")
  File "C:\Users\cpullen\AppData\Roaming\Python\Python37\site-packages\pandas\core\groupby\groupby.py", line 754, in get_group
    raise KeyError(name)
KeyError: '1'

如何按行值实现数据框的分离？

原文

I want to split my dataframe based on the first row to generate four separate dataframes (for subgroup analysis). I have a 172x106 Excel file, where the first row consists of either a 1, 2, 3, or 4. The other 171 lines are radiomic features, which I want to copy to the ''new'' dataset. The columns do not have headernames. My data looks like the following:

{0: [4.0, 0.65555056, 0.370511262, 16.5876203, 44.76954415, 48.0, 32.984845, 49.47726751, 49.47726751, 13133.33333, 29.34869973, 0.725907513, 3708.396349, 0.282365204, 13696.0, 2.122884402, 3.039611259, 1419.058749, 1.605529827, 0.488297449], 1: [2.0, 0.82581372, 0.33201741, 20.65753167, 62.21821817, 50.59644256, 62.60990337, 55.56977596, 77.35631842, 23890.66667, 51.38065822, 0.521666786, 7689.706847, 0.321870752, 25152.0, 1.022813615, 1.360453239, 548.2156387, 0.314035581, 0.181204079]}

I wanted to use groupby, but since the column headers have no name, it makes it hard to implement. This is my current code:

import numpy as np
df = pd.read_excel(r'H:\Documenten\MATLAB\sample_file.xlsx',header=None)
Class_1=df.groupby(df.T.loc[:,0])
df_new = Class_1.get_group("1")
print(df_new)

The error I get is the following:

Traceback (most recent call last):
  File "H:/PycharmProjects/RadiomicsPipeline/spearman_subgroups.py", line 5, in <module>
    df_new = Class_1.get_group("1")
  File "C:\Users\cpullen\AppData\Roaming\Python\Python37\site-packages\pandas\core\groupby\groupby.py", line 754, in get_group
    raise KeyError(name)
KeyError: '1'

How do I implement the separation of the dataframes by row values?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

木有鱼丸 2025-02-21 01:30:36

我不确定这是您想要的结果。如果没有，请清楚地显示所需的输出。

您可以简单地通过转换数据框来实现所需的目标。

import numpy as np
import pandas as pd

np.random.seed(0)

n = 5
data = np.stack((np.random.choice([1, 2, 3, 4], n), np.random.rand(n), np.random.rand(n), np.random.rand(n)), axis=0)
df = pd.DataFrame(data)

df.head（）：

0   1   2   3   4
0   1.000000    4.000000    2.000000    1.000000    4.000000
1   0.857946    0.847252    0.623564    0.384382    0.297535
2   0.056713    0.272656    0.477665    0.812169    0.479977
3   0.392785    0.836079    0.337396    0.648172    0.368242

df_transpose = df.transpose()
df_transpose.columns = ['group'] + list(df_transpose.columns.values[1:])

（

group   1   2   3
0   1.0 0.857946    0.056713    0.392785
1   4.0 0.847252    0.272656    0.836079
2   2.0 0.623564    0.477665    0.337396
3   1.0 0.384382    0.812169    0.648172
4   4.0 0.297535    0.479977    0.368242

list_df = dict()

for group in set(df_transpose['group']):
    list_df[group] = df_transpose.loc[df_transpose['group'] == group]

df_transpose.head

{
 1.0:
     group         1         2         3
     0    1.0  0.857946  0.056713  0.392785
     3    1.0  0.384382  0.812169  0.648172,
 2.0:
     group         1         2         3
     2    2.0  0.623564  0.477665  0.337396,
 4.0:
     group         1         2         3
     1    4.0  0.847252  0.272656  0.836079
     4    4.0  0.297535  0.479977  0.368242
}

） list_df [2.0]，list_df [3.0]（在此示例中不存在）， list_df [4.0]

（顺便说一句。

I am not sure that this is the result you want. If not, please clearly show the desired output.

You can simply achieve what you want by transposing your dataframe.

import numpy as np
import pandas as pd

np.random.seed(0)

n = 5
data = np.stack((np.random.choice([1, 2, 3, 4], n), np.random.rand(n), np.random.rand(n), np.random.rand(n)), axis=0)
df = pd.DataFrame(data)

df.head():

0   1   2   3   4
0   1.000000    4.000000    2.000000    1.000000    4.000000
1   0.857946    0.847252    0.623564    0.384382    0.297535
2   0.056713    0.272656    0.477665    0.812169    0.479977
3   0.392785    0.836079    0.337396    0.648172    0.368242

df_transpose = df.transpose()
df_transpose.columns = ['group'] + list(df_transpose.columns.values[1:])

df_transpose.head():

group   1   2   3
0   1.0 0.857946    0.056713    0.392785
1   4.0 0.847252    0.272656    0.836079
2   2.0 0.623564    0.477665    0.337396
3   1.0 0.384382    0.812169    0.648172
4   4.0 0.297535    0.479977    0.368242

list_df = dict()

for group in set(df_transpose['group']):
    list_df[group] = df_transpose.loc[df_transpose['group'] == group]

list_df:

{
 1.0:
     group         1         2         3
     0    1.0  0.857946  0.056713  0.392785
     3    1.0  0.384382  0.812169  0.648172,
 2.0:
     group         1         2         3
     2    2.0  0.623564  0.477665  0.337396,
 4.0:
     group         1         2         3
     1    4.0  0.847252  0.272656  0.836079
     4    4.0  0.297535  0.479977  0.368242
}

Individual dataframes:

list_df[1.0], list_df[2.0], list_df[3.0] (which does not exist in this example), list_df[4.0]

(BTW. I defined a wrong variable name. The name should be dict_df, not list_df.)

回复收藏 0 原文

流星番茄 2025-02-21 01:30:36

首先，在导入数据框后，按顺序对第一行的值进行排序，

df
Out[26]: 
   0  1  2  3
0  0  1  0  1
1  1  2  5  8
2  2  3  6  9
3  3  4  7  0

df = df.sort_values(by = 0, axis = 1)

Out[30]:
   0  2  1  3
0  0  0  1  1
1  1  5  2  8
2  2  6  3  9
3  3  7  4  0

您应该具有带有第1行值的数据框架。之后，您可以使用df.iloc重命名您的列名称，然后删除第一行。

df.columns = df.iloc[0]

0  0  0  1  1
0  0  0  1  1
1  1  5  2  8
2  2  6  3  9
3  3  7  4  0

df.drop(0, inplace=True)

df.drop(0)
Out[51]: 
0  0  0  1  1
1  1  5  2  8
2  2  6  3  9
3  3  7  4  0

最终，您可以根据列名进行切片。

df_1 = df[1]

df_1
Out[56]: 
0  1  1
1  2  8
2  3  9
3  4  0

First of all, after importing the dataframe, sort the value of the first row in order

df
Out[26]: 
   0  1  2  3
0  0  1  0  1
1  1  2  5  8
2  2  3  6  9
3  3  4  7  0

df = df.sort_values(by = 0, axis = 1)

Out[30]:
   0  2  1  3
0  0  0  1  1
1  1  5  2  8
2  2  6  3  9
3  3  7  4  0

You should have a dataframe with ordered 1st row values. After that, you can use df.iloc to rename your column name and you will drop the first row.

df.columns = df.iloc[0]

0  0  0  1  1
0  0  0  1  1
1  1  5  2  8
2  2  6  3  9
3  3  7  4  0

df.drop(0, inplace=True)

df.drop(0)
Out[51]: 
0  0  0  1  1
1  1  5  2  8
2  2  6  3  9
3  3  7  4  0

Eventually, you can do slicing based on the column name.

df_1 = df[1]

df_1
Out[56]: 
0  1  1
1  2  8
2  3  9
3  4  0

回复收藏 0 原文

~没有更多了~

关于作者

罗罗贝儿

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

按行值分开数据框

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

按行值分开数据框

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。