为每个客户添加缺少的行-Python / Pandas
我有周、月和年的 df 。
week = ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022']
month = ["January", "January", "January", "January"]
year = [2022, 2022, 2022, 2022]
myDict = {}
myDict["Week"] = week
myDict["Month"] = month
myDict["Year"] = year
dates_df = pd.DataFrame(data=myDict)
dates_df
Week Month Year
01/03/2022 - 01/09/2022 January 2022
01/10/2022 - 01/16/2022 January 2022
01/17/2022 - 01/23/2022 January 2022
01/24/2022 - 01/30/2022 January 2022
数据看起来像这样(以前分组):
test_data = {'CLient Id': [1,1,1,1,2,2,2,3,3],
'Client Name': ['Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Anna Delvey', 'Anna Delvey'],
'City': ['New York', 'New York', 'New York', 'New York', 'Los Angeles', 'Los Angeles', 'Los Angeles', 'New York', 'New York'],
'Week': ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
'01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/24/2022 - 01/30/2022', '01/03/2022 - 01/09/2022',
'01/24/2022 - 01/30/2022'],
'Month': ['January', 'January', 'January', 'January', 'January', 'January', 'January', 'January', 'January'],
'Year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022],
'Spent': [1000, 750, 1200, 850, 777, 1500, 1200, 1500, 1200]}
test_df = pd.DataFrame(data=test_data)
test_df
Client Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200
我需要为每个缺少周的客户(Id)创建客户名称、城市、周、月和年的附加行,在这种情况下,从日期 df + Spent 列应该为 0。
我需要的输出:
Client Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/17/2022 - 01/23/2022 January 2022 0
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/10/2022 - 01/16/2022 January 2022 0
3 Anna Delvey New York 01/17/2022 - 01/23/2022 January 2022 0
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200
我试图对其进行编码,但没有运气。
I have df with Weeks, Months and Years.
week = ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022']
month = ["January", "January", "January", "January"]
year = [2022, 2022, 2022, 2022]
myDict = {}
myDict["Week"] = week
myDict["Month"] = month
myDict["Year"] = year
dates_df = pd.DataFrame(data=myDict)
dates_df
Week Month Year
01/03/2022 - 01/09/2022 January 2022
01/10/2022 - 01/16/2022 January 2022
01/17/2022 - 01/23/2022 January 2022
01/24/2022 - 01/30/2022 January 2022
Data looks like that (Previously Grouped):
test_data = {'CLient Id': [1,1,1,1,2,2,2,3,3],
'Client Name': ['Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Anna Delvey', 'Anna Delvey'],
'City': ['New York', 'New York', 'New York', 'New York', 'Los Angeles', 'Los Angeles', 'Los Angeles', 'New York', 'New York'],
'Week': ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
'01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/24/2022 - 01/30/2022', '01/03/2022 - 01/09/2022',
'01/24/2022 - 01/30/2022'],
'Month': ['January', 'January', 'January', 'January', 'January', 'January', 'January', 'January', 'January'],
'Year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022],
'Spent': [1000, 750, 1200, 850, 777, 1500, 1200, 1500, 1200]}
test_df = pd.DataFrame(data=test_data)
test_df
Client Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200
I need to create additional rows of Client Name, City, Week, Month and Year for each Client (Id) which missing Week, from the date df + Spent column in this case should be 0.
Output I need:
Client Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/17/2022 - 01/23/2022 January 2022 0
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/10/2022 - 01/16/2022 January 2022 0
3 Anna Delvey New York 01/17/2022 - 01/23/2022 January 2022 0
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200
I was trying to code it, but got no luck.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以
pivot
+reindex
+fillna
(获取丢失的数据)+stack
(返回到前一个形状):输出:
如果有多于一列需要填充 0,则可以使用:
You could
pivot
+reindex
+fillna
(to get the missing data) +stack
(to get back to the previous shape):Output:
If you have more than one columns to fill with 0, you could use: