进行宽大的数据帧,然后根据另一列的名称添加列
我需要使用列的一些名称作为DF的一部分。在保持前3列相同的同时,我需要根据行的内容创建其他一些列。
在这里,我有一些客户的交易:
cust_id cust_first cust_last au_zo au_zo_pay fi_gu fi_gu_pay wa wa_pay
0 1000 Andrew Jones 50.85 debit NaN NaN 69.12 debit
1 1001 Fatima Lee NaN NaN 18.16 debit NaN NaN
2 1002 Sophia Lewis NaN NaN NaN NaN 159.54. credit
3 1003 Edward Bush 45.29 credit 59.63 credit NaN NaN
4 1004 Mark Nunez 20.87 credit 20.87 credit 86.18 debit
首先,我需要添加一个新专栏“ City”。因为它不在数据库中。它违约为“纽约”。 (很容易!)
但是这是我被卡住的地方: 添加一个新列“商店”将根据进行交易的位置保存值。 au_zo-> autozone,fi_gu->五个家伙,华盛顿 - > Walmart
根据先前添加的商店添加新列“分类”:自动区域 - >自动修复,五个家伙 - >食物,沃尔玛 - >杂货
列“金额”拥有客户和存储的价值。
列'transaction_type'是au_zo_pay,fi_gu_pay,wa_pay的值。
因此,最后看起来像这样:
cust_id city cust_first cust_last store classification amount trans_type
0 1000 New York Andrew Jones auto zone auto-repair 50.85 debit
1 1000 New York Andrew Jones walmart groceries 69.12 debit
2 1001 New York Fatima Lee five guys food 18.16 debit
3 1002 New York Sophia Solis walmart groceries 159.54 credit
4 1003 New York Edward Bush auto zone auto-repair 45.29 credit
5 1003 New York Edward Bush five guys food 59.63 credit
6 1004 New York Mark Nunez auto zone auto-repair 20.87 credit
7 1004 New York Mark Nunez five guys food 20.87 credit
8 1004 New York Mark Nunez walmart groceries 86.18 debit
我尝试使用df. -melt()
,但我没有得到结果。
I need to use some names of the columns as part of the df. While keeping the first 3 columns identical, I need to create some other columns based on the content of the row.
Here I have some transactions from some customers:
cust_id cust_first cust_last au_zo au_zo_pay fi_gu fi_gu_pay wa wa_pay
0 1000 Andrew Jones 50.85 debit NaN NaN 69.12 debit
1 1001 Fatima Lee NaN NaN 18.16 debit NaN NaN
2 1002 Sophia Lewis NaN NaN NaN NaN 159.54. credit
3 1003 Edward Bush 45.29 credit 59.63 credit NaN NaN
4 1004 Mark Nunez 20.87 credit 20.87 credit 86.18 debit
First, I need to add a new column, 'city'. Since it is not on the database. It is defaulted to be 'New York'. (that's easy!)
But here is where I am getting stuck:
Add a new column 'store' holds values according to where a transaction took place. au_zo --> autozone, fi_gu --> five guys, wa --> walmart
Add new column 'classification' according to the store previously added: auto zone --> auto-repair, five guys --> food, walmart --> groceries
Column 'amount' holds the value of the customer and store.
Column 'transaction_type' is the value of au_zo_pay, fi_gu_pay, wa_pay respectively.
So at the end it looks like this:
cust_id city cust_first cust_last store classification amount trans_type
0 1000 New York Andrew Jones auto zone auto-repair 50.85 debit
1 1000 New York Andrew Jones walmart groceries 69.12 debit
2 1001 New York Fatima Lee five guys food 18.16 debit
3 1002 New York Sophia Solis walmart groceries 159.54 credit
4 1003 New York Edward Bush auto zone auto-repair 45.29 credit
5 1003 New York Edward Bush five guys food 59.63 credit
6 1004 New York Mark Nunez auto zone auto-repair 20.87 credit
7 1004 New York Mark Nunez five guys food 20.87 credit
8 1004 New York Mark Nunez walmart groceries 86.18 debit
I have tried using df.melt()
but I don't get the results.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是你想要的吗?
另外,此方法将有其限制。例如,当一对一关系基于这四个密钥不正确时,它将生成错误的数据集。
Is this something you want?
In addition, this method will have its limitation. For example, when one to one relationship is not true based on those four keys, it will generate wrong dataset.
尝试此
Try this
另一种方式如下:
df1
完全如df
,重命名的名称即具有名称nose
从商店值中介绍为更长的时间使用
pd.wide_to_long
并进行替换。// nb您可以考虑使用
pivot_longer
来自Janitor
One other way is as follows:
df1
is exactly asdf
with renamed names ie having the nameamount
in from of the store valueNow pivot to longer using
pd.wide_to_long
and do the replacement.//NB You could consider using
pivot_longer
fromjanitor
转换为长形式的一个选项是来自 pyjanitor ;它具有很多选项,对于此特定用例,我们在使用其他PANDAS函数重命名并添加新列:
one option for transforming to long form is with pivot_longer from pyjanitor; it has a lot of options, for this particular use case, we pull out multiple values and multiple names (that are paired with the appropriate regex), before using other Pandas functions to rename and add new columns: