使用另一列作为源变量后缀创建数据框列

发布于 2025-01-12 03:38:07 字数 836 浏览 2 评论 0原文

很难命名，所以对此表示歉意......
这是一些示例数据：

region   FC_EA   FC_EM   FC_GL   FC_XX   FC_YY  ...
  GL       4       2       8       6       1    ...
  YY       9       7       2       1       3    ...

有许多带有后缀的列，因此 ...

[编辑] 还有许多其他列。我想保留所有列。

目的是创建一个名为 FC 的列，该列是根据 region 列值确定的值。
因此，对于这些数据，结果列将是：

FC
8
3

我目前有几种方法可以实现此目的 - 一种方法是最少的代码（对于小型数据集可能很好）：

df['FC'] = df.apply(lambda x: x['FC_'+x.region], axis=1)

另一种方法是堆叠 np.where 查询 - 对于大型数据集更快建议我使用数据集...：

df['FC'] = np.where(df.region=='EA', df.FC_EA,
             np.where(df.region=='EM', df.FC_EM,
             np.where(df.region=='GL', df.FC_GL, ...

我想知道是否有人可以提出最好的方法来做到这一点，如果有比这些选项更好的东西？
那太好了。

谢谢！

原文

Difficult to title, so apologies for that...
Here is some example data:

region   FC_EA   FC_EM   FC_GL   FC_XX   FC_YY  ...
  GL       4       2       8       6       1    ...
  YY       9       7       2       1       3    ...

There are many columns with a suffix, hence the ...

[edit] And there are many other columns. I want to keep all columns.

The aim is to create a column called FC that is the value according to the region column value.
So, for this data the resultant column would be:

FC
8
3

I have a couple of ways to achieve this at present - one way is minimal code (perhaps fine for a small dataset):

df['FC'] = df.apply(lambda x: x['FC_'+x.region], axis=1)

Another way is a stacked np.where query - faster for large datasets I am advised...:

df['FC'] = np.where(df.region=='EA', df.FC_EA,
             np.where(df.region=='EM', df.FC_EM,
             np.where(df.region=='GL', df.FC_GL, ...

I am wondering if anyone out there can suggest the best way to do this, if there is something better than these options?
That would be great.

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

刘备忘录 2025-01-19 03:38:07

您可以使用 melt:

(df.melt(id_vars='region', value_name='FC')
   .loc[lambda d: d['region'].eq(d['variable'].str[3:]), ['region', 'FC']]
)

或使用 apply （可能相当慢）：

df['FC'] = (df.set_index('region')
              .apply(lambda r: r.loc[f'FC_{r.name}'], axis=1)
              .values
            )

输出：

  region  FC
4     GL   8
9     YY   3

You could use melt:

(df.melt(id_vars='region', value_name='FC')
   .loc[lambda d: d['region'].eq(d['variable'].str[3:]), ['region', 'FC']]
)

or using apply (probably quite slower):

df['FC'] = (df.set_index('region')
              .apply(lambda r: r.loc[f'FC_{r.name}'], axis=1)
              .values
            )

output:

  region  FC
4     GL   8
9     YY   3

回复收藏 0 原文

~没有更多了~

关于作者

夏见

暂无简介

文章

27 人气

关注发私信

十二

文章 0 评论 0

关注

飞烟轻若梦

文章 0 评论 0

关注

OPleyuhuo

文章 0 评论 0

关注

wxb0109

文章 0 评论 0

关注

旧城空念

文章 0 评论 0

关注

-小熊_

文章 0 评论 0

友情链接

文江博客

使用另一列作为源变量后缀创建数据框列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

使用另一列作为源变量后缀创建数据框列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。