为每个客户添加缺少的行-Python / Pandas

发布于 2025-01-18 04:52:29 字数 3907 浏览 0 评论 0原文

我有周、月和年的 df 。

week = ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022']
month = ["January", "January", "January", "January"]
year = [2022, 2022, 2022, 2022]

myDict = {}

myDict["Week"] = week
myDict["Month"] = month 
myDict["Year"] = year

dates_df = pd.DataFrame(data=myDict)
dates_df
         Week              Month    Year

01/03/2022 - 01/09/2022   January   2022
01/10/2022 - 01/16/2022   January   2022
01/17/2022 - 01/23/2022   January   2022
01/24/2022 - 01/30/2022   January   2022

数据看起来像这样(以前分组):

test_data = {'CLient Id': [1,1,1,1,2,2,2,3,3],
    'Client Name': ['Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Anna Delvey', 'Anna Delvey'],
    'City': ['New York', 'New York', 'New York', 'New York', 'Los Angeles', 'Los Angeles', 'Los Angeles', 'New York', 'New York'],    
    'Week': ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
            '01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/24/2022 - 01/30/2022', '01/03/2022 - 01/09/2022',
            '01/24/2022 - 01/30/2022'], 
    'Month': ['January', 'January', 'January', 'January', 'January', 'January', 'January', 'January', 'January'], 
    'Year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022],
    'Spent': [1000, 750, 1200, 850, 777, 1500, 1200, 1500, 1200]}

test_df = pd.DataFrame(data=test_data)
test_df
Client Id    Client Name          City                Week               Month    Year   Spent

1            Tom Holland       New York      01/03/2022 - 01/09/2022    January   2022   1000
1            Tom Holland       New York      01/10/2022 - 01/16/2022    January   2022   750
1            Tom Holland       New York      01/17/2022 - 01/23/2022    January   2022   1200
1            Tom Holland       New York      01/24/2022 - 01/30/2022    January   2022   850
2            Brad Pitt         Los Angeles   01/03/2022 - 01/09/2022    January   2022   777
2            Brad Pitt         Los Angeles   01/10/2022 - 01/16/2022    January   2022   1500
2            Brad Pitt         Los Angeles   01/24/2022 - 01/30/2022    January   2022   1200
3            Anna Delvey       New York      01/03/2022 - 01/09/2022    January   2022   1500
3            Anna Delvey       New York      01/24/2022 - 01/30/2022    January   2022   1200

我需要为每个缺少周的客户(Id)创建客户名称、城市、周、月和年的附加行,在这种情况下,从日期 df + Spent 列应该为 0。

我需要的输出:

Client Id    Client Name          City                  Week               Month    Year   Spent

1            Tom Holland        New York       01/03/2022 - 01/09/2022    January   2022   1000
1            Tom Holland        New York       01/10/2022 - 01/16/2022    January   2022   750
1            Tom Holland        New York       01/17/2022 - 01/23/2022    January   2022   1200
1            Tom Holland        New York       01/24/2022 - 01/30/2022    January   2022   850
2            Brad Pitt          Los Angeles    01/03/2022 - 01/09/2022    January   2022   777
2            Brad Pitt          Los Angeles    01/10/2022 - 01/16/2022    January   2022   1500
2            Brad Pitt          Los Angeles    01/17/2022 - 01/23/2022    January   2022   0
2            Brad Pitt          Los Angeles    01/24/2022 - 01/30/2022    January   2022   1200
3            Anna Delvey        New York       01/03/2022 - 01/09/2022    January   2022   1500
3            Anna Delvey        New York       01/10/2022 - 01/16/2022    January   2022   0
3            Anna Delvey        New York       01/17/2022 - 01/23/2022    January   2022   0
3            Anna Delvey        New York       01/24/2022 - 01/30/2022    January   2022   1200

我试图对其进行编码,但没有运气。

I have df with Weeks, Months and Years.

week = ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022']
month = ["January", "January", "January", "January"]
year = [2022, 2022, 2022, 2022]

myDict = {}

myDict["Week"] = week
myDict["Month"] = month 
myDict["Year"] = year

dates_df = pd.DataFrame(data=myDict)
dates_df
         Week              Month    Year

01/03/2022 - 01/09/2022   January   2022
01/10/2022 - 01/16/2022   January   2022
01/17/2022 - 01/23/2022   January   2022
01/24/2022 - 01/30/2022   January   2022

Data looks like that (Previously Grouped):

test_data = {'CLient Id': [1,1,1,1,2,2,2,3,3],
    'Client Name': ['Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Anna Delvey', 'Anna Delvey'],
    'City': ['New York', 'New York', 'New York', 'New York', 'Los Angeles', 'Los Angeles', 'Los Angeles', 'New York', 'New York'],    
    'Week': ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
            '01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/24/2022 - 01/30/2022', '01/03/2022 - 01/09/2022',
            '01/24/2022 - 01/30/2022'], 
    'Month': ['January', 'January', 'January', 'January', 'January', 'January', 'January', 'January', 'January'], 
    'Year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022],
    'Spent': [1000, 750, 1200, 850, 777, 1500, 1200, 1500, 1200]}

test_df = pd.DataFrame(data=test_data)
test_df
Client Id    Client Name          City                Week               Month    Year   Spent

1            Tom Holland       New York      01/03/2022 - 01/09/2022    January   2022   1000
1            Tom Holland       New York      01/10/2022 - 01/16/2022    January   2022   750
1            Tom Holland       New York      01/17/2022 - 01/23/2022    January   2022   1200
1            Tom Holland       New York      01/24/2022 - 01/30/2022    January   2022   850
2            Brad Pitt         Los Angeles   01/03/2022 - 01/09/2022    January   2022   777
2            Brad Pitt         Los Angeles   01/10/2022 - 01/16/2022    January   2022   1500
2            Brad Pitt         Los Angeles   01/24/2022 - 01/30/2022    January   2022   1200
3            Anna Delvey       New York      01/03/2022 - 01/09/2022    January   2022   1500
3            Anna Delvey       New York      01/24/2022 - 01/30/2022    January   2022   1200

I need to create additional rows of Client Name, City, Week, Month and Year for each Client (Id) which missing Week, from the date df + Spent column in this case should be 0.

Output I need:

Client Id    Client Name          City                  Week               Month    Year   Spent

1            Tom Holland        New York       01/03/2022 - 01/09/2022    January   2022   1000
1            Tom Holland        New York       01/10/2022 - 01/16/2022    January   2022   750
1            Tom Holland        New York       01/17/2022 - 01/23/2022    January   2022   1200
1            Tom Holland        New York       01/24/2022 - 01/30/2022    January   2022   850
2            Brad Pitt          Los Angeles    01/03/2022 - 01/09/2022    January   2022   777
2            Brad Pitt          Los Angeles    01/10/2022 - 01/16/2022    January   2022   1500
2            Brad Pitt          Los Angeles    01/17/2022 - 01/23/2022    January   2022   0
2            Brad Pitt          Los Angeles    01/24/2022 - 01/30/2022    January   2022   1200
3            Anna Delvey        New York       01/03/2022 - 01/09/2022    January   2022   1500
3            Anna Delvey        New York       01/10/2022 - 01/16/2022    January   2022   0
3            Anna Delvey        New York       01/17/2022 - 01/23/2022    January   2022   0
3            Anna Delvey        New York       01/24/2022 - 01/30/2022    January   2022   1200

I was trying to code it, but got no luck.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

单调的奢华 2025-01-25 04:52:29

您可以pivot + reindex + fillna(获取丢失的数据)+ stack(返回到前一个形状):

columns = ['Week','Month','Year']
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, ['Spent'])
       .reindex(pd.MultiIndex.from_arrays(dates_df.assign(Spent='Spent').to_numpy()[:, [-1,0,1,2]].T, 
                                          names=[None]+columns), axis=1)
       .fillna(0).stack(level=columns).reset_index())

输出:

    CLient Id  Client Name         City                     Week    Month  Year   Spent  
0           1  Tom Holland     New York  01/03/2022 - 01/09/2022  January  2022  1000.0  
1           1  Tom Holland     New York  01/10/2022 - 01/16/2022  January  2022   750.0  
2           1  Tom Holland     New York  01/17/2022 - 01/23/2022  January  2022  1200.0  
3           1  Tom Holland     New York  01/24/2022 - 01/30/2022  January  2022   850.0  
4           2    Brad Pitt  Los Angeles  01/03/2022 - 01/09/2022  January  2022   777.0  
5           2    Brad Pitt  Los Angeles  01/10/2022 - 01/16/2022  January  2022  1500.0  
6           2    Brad Pitt  Los Angeles  01/17/2022 - 01/23/2022  January  2022     0.0  
7           2    Brad Pitt  Los Angeles  01/24/2022 - 01/30/2022  January  2022  1200.0  
8           3  Anna Delvey     New York  01/03/2022 - 01/09/2022  January  2022  1500.0  
9           3  Anna Delvey     New York  01/10/2022 - 01/16/2022  January  2022     0.0  
10          3  Anna Delvey     New York  01/17/2022 - 01/23/2022  January  2022     0.0  
11          3  Anna Delvey     New York  01/24/2022 - 01/30/2022  January  2022  1200.0  

如果有多于一列需要填充 0,则可以使用:

columns = ['Week','Month','Year']
value_columns = ['Spent', ...]
new_df = (pd.DataFrame({'key':[1]*len(value_columns),'New':value_columns})
          .merge(dates_df.assign(key=1)).drop(columns='key'))
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, value_columns)
       .reindex(pd.MultiIndex.from_arrays(zip(*new_df.to_numpy()), 
                                          names=[None] + columns), axis=1)
       .fillna(0).stack(level=columns).reset_index())

You could pivot + reindex + fillna (to get the missing data) + stack (to get back to the previous shape):

columns = ['Week','Month','Year']
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, ['Spent'])
       .reindex(pd.MultiIndex.from_arrays(dates_df.assign(Spent='Spent').to_numpy()[:, [-1,0,1,2]].T, 
                                          names=[None]+columns), axis=1)
       .fillna(0).stack(level=columns).reset_index())

Output:

    CLient Id  Client Name         City                     Week    Month  Year   Spent  
0           1  Tom Holland     New York  01/03/2022 - 01/09/2022  January  2022  1000.0  
1           1  Tom Holland     New York  01/10/2022 - 01/16/2022  January  2022   750.0  
2           1  Tom Holland     New York  01/17/2022 - 01/23/2022  January  2022  1200.0  
3           1  Tom Holland     New York  01/24/2022 - 01/30/2022  January  2022   850.0  
4           2    Brad Pitt  Los Angeles  01/03/2022 - 01/09/2022  January  2022   777.0  
5           2    Brad Pitt  Los Angeles  01/10/2022 - 01/16/2022  January  2022  1500.0  
6           2    Brad Pitt  Los Angeles  01/17/2022 - 01/23/2022  January  2022     0.0  
7           2    Brad Pitt  Los Angeles  01/24/2022 - 01/30/2022  January  2022  1200.0  
8           3  Anna Delvey     New York  01/03/2022 - 01/09/2022  January  2022  1500.0  
9           3  Anna Delvey     New York  01/10/2022 - 01/16/2022  January  2022     0.0  
10          3  Anna Delvey     New York  01/17/2022 - 01/23/2022  January  2022     0.0  
11          3  Anna Delvey     New York  01/24/2022 - 01/30/2022  January  2022  1200.0  

If you have more than one columns to fill with 0, you could use:

columns = ['Week','Month','Year']
value_columns = ['Spent', ...]
new_df = (pd.DataFrame({'key':[1]*len(value_columns),'New':value_columns})
          .merge(dates_df.assign(key=1)).drop(columns='key'))
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, value_columns)
       .reindex(pd.MultiIndex.from_arrays(zip(*new_df.to_numpy()), 
                                          names=[None] + columns), axis=1)
       .fillna(0).stack(level=columns).reset_index())
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文