如何在列中更改文本,然后将第一行与列标题相结合?

发布于 2025-02-13 15:19:36 字数 1095 浏览 2 评论 0 原文

我有以下数据框:

data={'a1':['X1',2,3,4,5],'Unnamed: 02':['Y1',5,6,7,8],'b1':['X2',5,3,7,9],'Unnamed: 05':['Y2',5,8,9,3],'c1':['X3',4,5,7,5],'Unnamed: 07':['Y3',5,8,9,3],'d1':['P',2,4,5,7],'Unnamed: 09':['M',8,4,6,7]}
df=pd.DataFrame(data)
df.columns=df.columns.to_series().mask(lambda x: x.str.startswith('Unnamed')).ffill()
df

”在此处输入图像描述”

我想做的几件事:

  1. 更改包含的行(x1,x2&x3)仅为'x(反之亦然,y1,y2,y3 in'y)将
  2. 现有列标头与包含x,y,p,m的行组合在一起,

结果应该看起来像:

  1. 更改行将(x1,x2&x3)包含在'x中(反之亦然,y1,y2,y3 in'y)

  1. 将现有的列标头与包含X,Y,P,M-的行组合'm'完全替换了'd1'。

I have the following dataframe:

data={'a1':['X1',2,3,4,5],'Unnamed: 02':['Y1',5,6,7,8],'b1':['X2',5,3,7,9],'Unnamed: 05':['Y2',5,8,9,3],'c1':['X3',4,5,7,5],'Unnamed: 07':['Y3',5,8,9,3],'d1':['P',2,4,5,7],'Unnamed: 09':['M',8,4,6,7]}
df=pd.DataFrame(data)
df.columns=df.columns.to_series().mask(lambda x: x.str.startswith('Unnamed')).ffill()
df

enter image description here

There are a few things which I would like to do:

  1. Change the rows containing (X1, X2 & X3) into just 'X (vice versa for Y1,Y2,Y3 into 'Y)
  2. Combine the existing column header with the row containing X,Y,P,M

The outcome should look like:

  1. Change the rows containing (X1, X2 & X3) into just 'X (vice versa for Y1,Y2,Y3 into 'Y)

enter image description here

  1. Combine the existing column header with the row containing X,Y,P,M - Also note that the 'P' and 'M' completely replaces 'd1' respectively.

enter image description here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

揪着可爱 2025-02-20 15:19:37

尝试一下。

import numpy as np
# extract only the letters from first row
first_row = df.iloc[0].str.extract('([A-Z]+)')[0]
# update column names by first_row
# the columns with P and M in it have their names completely replaced
df.columns = np.where(first_row.isin(['P', 'M']), first_row, df.columns + '_' + first_row.values)
# remove first row
df = df.iloc[1:].reset_index(drop=True)
df

Try this.

import numpy as np
# extract only the letters from first row
first_row = df.iloc[0].str.extract('([A-Z]+)')[0]
# update column names by first_row
# the columns with P and M in it have their names completely replaced
df.columns = np.where(first_row.isin(['P', 'M']), first_row, df.columns + '_' + first_row.values)
# remove first row
df = df.iloc[1:].reset_index(drop=True)
df

enter image description here

南笙 2025-02-20 15:19:37

另外,您可以执行这样的操作:

# Transpose data frame and make index to column
df = df.T.reset_index()
# Assign new column, use length of first row as condition
df["column"] = np.where(df[0].str.len() > 1, df["index"].str[:] + "_" + df[0].str[0], df[0].str[0])
df.drop(columns=["index", 0]).set_index("column").T.rename_axis(None, axis=1)

----------------------------------------------------------
    a1_X    a1_Y    b1_X    b1_Y    c1_X    c1_Y    P   M
1   2       5       5       5       4       5       2   8
2   3       6       3       8       5       8       4   4
3   4       7       7       9       7       9       5   6
4   5       8       9       3       5       3       7   7

----------------------------------------------------------

它是一个更通用的解决方案,因为它使用每个行零条目的长度作为条件,而不是实际值“ P”和“ M”。因此,它适用于每个字符串。

Alternatively, you can do something like this:

# Transpose data frame and make index to column
df = df.T.reset_index()
# Assign new column, use length of first row as condition
df["column"] = np.where(df[0].str.len() > 1, df["index"].str[:] + "_" + df[0].str[0], df[0].str[0])
df.drop(columns=["index", 0]).set_index("column").T.rename_axis(None, axis=1)

----------------------------------------------------------
    a1_X    a1_Y    b1_X    b1_Y    c1_X    c1_Y    P   M
1   2       5       5       5       4       5       2   8
2   3       6       3       8       5       8       4   4
3   4       7       7       9       7       9       5   6
4   5       8       9       3       5       3       7   7

----------------------------------------------------------

It's a more general solution as it uses the length of each row-zero entry as a condition, not the actual values 'P' and 'M'. Thus, it holds for each single character string.

寻找一个思念的角度 2025-02-20 15:19:37
df.columns = [x + '_' + y[0] if len(y)>1 else y for x, y in df.iloc[0].reset_index().values]
df = df[1:].reset_index(drop=True)
print(df)

输出:

  a1_X a1_Y b1_X b1_Y c1_X c1_Y  P  M
0    2    5    5    5    4    5  2  8
1    3    6    3    8    5    8  4  4
2    4    7    7    9    7    9  5  6
3    5    8    9    3    5    3  7  7
df.columns = [x + '_' + y[0] if len(y)>1 else y for x, y in df.iloc[0].reset_index().values]
df = df[1:].reset_index(drop=True)
print(df)

Output:

  a1_X a1_Y b1_X b1_Y c1_X c1_Y  P  M
0    2    5    5    5    4    5  2  8
1    3    6    3    8    5    8  4  4
2    4    7    7    9    7    9  5  6
3    5    8    9    3    5    3  7  7
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文