特定列的熊猫中的vlookup

发布于 2025-02-07 15:31:34 字数 3632 浏览 2 评论 0原文

我在熊猫中使用合并功能面临一些困难。我正在为此寻找某种vlookup公式。但是,我无法解决问题。

我的数据很大,由于机密性,我无法在这里共享。但是,我尝试在这里提出类似的数据。

旧代码新代码名称发票日期
1001011NA奶酪蛋糕02/02/2021
1001012NACoffee03/05/2021
1001011NA奶酪蛋糕30/05/2021
NA2002093Jasmine21/08/2021
NA2002042COOKIETea
NA2002080咖啡09/01/2022
NA2002093Jasmine Tea05/05/2022
NA2002058奶酪蛋糕07/06/2022

我想在上表中的成本列输入。但是,成本非常按发票日期(也请注意更改产品代码)。我们有2个成本表。 2021年:

旧代码新代码名称JAN 21年2月21日3月21日4月21日,6月21日,6月21日,7月21日,21日8月21日,8月21日,9月, 10月21日,11月21日,12月21日
1001011200201152058奶酪蛋糕505150535452555350 52535353
10010122002080咖啡56565756565 6 66
10010152002093茉莉茶433443333 333 3 34
1001020200202002042cookies2020 2120 2022 222220212020 22202122

和2022年:

旧代码新代码名称Jan-22Feb-22Mar-224月22日,4月22日,6月22日,7月22日,8月22日,8月22日,9月22日,10月22日,10月22日22Dec-22
10010112002058奶酪蛋糕525255555652Na Na Na Na Na Na Na Na Na NaNaNaNa Na NaNa NaNa
10010152002093Na Na Na Na Na Na Na Na Na Na Na Na5 5.5NaNaNaNaNaNa NaNaNa NaNa Na Na NaNa NaNa Na
5.55 5.5 5 5.5Jasmine Tea4335 5.5 55.5 55.555.55 5.55Na NaNA
10010202002042Cookies2222232323.523NANANANANANA

So basically, I would like to have my cost column in my first Data Frame to reflect the correct costing for different Year and different Month.

示例:

03/05/2021 = May_2021的发票日期成本

I am facing some difficulties using merge function in Pandas. I am looking for some kind of Vlookup formula for this. However, I couldn't solve my problem.

My data is huge and I couldn't share here due to confidentiality. However, I try to came up with similar data here.

Old CodeNew CodeNameInvoice Date
1001011NACheese Cake02/02/2021
1001012NACoffee03/05/2021
1001011NACheese Cake30/05/2021
NA2002093Jasmine Tea21/08/2021
NA2002042Cookies31/12/2021
NA2002080Coffee09/01/2022
NA2002093Jasmine Tea05/05/2022
NA2002058Cheese Cake07/06/2022

I would like to have a COST Column input in my table above. However, the cost is very by invoice date (Also take note on the changing of product code). We have 2 cost table.
For year 2021:

Old CodeNew CodeNameJan-21Feb-21Mar-21Apr-21May-21June-21Jul-21Aug-21Sep-21Oct-21Nov-21Dec-21
10010112002058Cheese Cake505150535452555350525353
10010122002080Coffee565665756566
10010152002093Jasmine Tea433443533334
10010202002042Cookies202021202220212022202122

And also for Year 2022:

Old CodeNew CodeNameJan-22Feb-22Mar-22Apr-22May-22June-22Jul-22Aug-22Sep-22Oct-22Nov-22Dec-22
10010112002058Cheese Cake525255555652NANANANANANA
10010122002080Coffee565666.5NANANANANANA
10010152002093Jasmine Tea433555.5NANANANANANA
10010202002042Cookies2222232323.523NANANANANANA

So basically, I would like to have my cost column in my first Data Frame to reflect the correct costing for different Year and different Month.

Example:

Invoice Date Costing for 03/05/2021 = May_2021

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

鹿港巷口少年归 2025-02-14 15:31:34

合并时,您需要双方都有一个月和代码编号,因此:

  1. 在发票数据框架中创建一个年度月份的列,该列与成本表相一致
  2. ,将两个成本表分别
  3. 与新代码和旧代码合并在一起


import pandas as pd
import io
import datetime

invoice_data_text = '''Old Code New Code    Name    Invoice Date
1001011 NA  Cheese Cake 02/02/2021
1001012 NA  Coffee  03/05/2021
1001011 NA  Cheese Cake 30/05/2021
NA  2002093 Jasmine Tea 21/08/2021
NA  2002042 Cookies 31/12/2021
NA  2002080 Coffee  09/01/2022
NA  2002093 Jasmine Tea 05/05/2022
NA  2002058 Cheese Cake 07/06/2022
'''

cost_2021_text = '''
Old Code    New Code    Name    Jan-21  Feb-21  Mar-21  Apr-21  May-21  June-21 Jul-21  Aug-21  Sep-21  Oct-21  Nov-21  Dec-21
1001011 2002058 Cheese Cake 50  51  50  53  54  52  55  53  50  52  53  53
1001012 2002080 Coffee  5   6   5   6   6   5   7   5   6   5   6   6
1001015 2002093 Jasmine Tea 4   3   3   4   4   3   5   3   3   3   3   4
1001020 2002042 Cookies 20  20  21  20  22  20  21  20  22  20  21  22
'''

cost_2022_text = '''
Old Code    New Code    Name    Jan-22  Feb-22  Mar-22  Apr-22  May-22  June-22 Jul-22  Aug-22  Sep-22  Oct-22  Nov-22  Dec-22
1001011 2002058 Cheese Cake 52  52  55  55  56  52  NA  NA  NA  NA  NA  NA
1001012 2002080 Coffee  5   6   5   6   6   6.5 NA  NA  NA  NA  NA  NA
1001015 2002093 Jasmine Tea 4   3   3   5   5   5.5 NA  NA  NA  NA  NA  NA
1001020 2002042 Cookies 22  22  23  23  23.5    23  NA  NA  NA  NA  NA  NA
'''

# Prepare
invoice_df = pd.read_csv(io.StringIO(invoice_data_text),sep="\t",parse_dates=["Invoice Date"])
cost21 = pd.read_csv(io.StringIO(cost_2021_text),sep='\t')
cost22 =  pd.read_csv(io.StringIO(cost_2022_text),sep='\t')

# Create Month column for merging
invoice_df["Month"] = invoice_df["Invoice Date"].map(lambda x:datetime.datetime.strftime(x,"%b-%y"))

# Combine two cost tables
cost21_stack = cost21.set_index(list(cost21.columns[:3])).stack().reset_index(name="Cost")
cost22_stack = cost22.set_index(list(cost22.columns[:3])).stack().reset_index(name="Cost")
cost_table = pd.concat([cost21_stack,cost22_stack]).rename({"level_3":"Month"},axis=1)

# Merge with new code and old code respectively
old_code_result = pd.merge(invoice_df[pd.isna(invoice_df["Old Code"]) == False], cost_table[["Old Code","Month","Cost"]], on=["Old Code","Month"] ,how="left")
new_code_result = pd.merge(invoice_df[pd.isna(invoice_df["New Code"]) == False], cost_table[["New Code","Month","Cost"]], on=["New Code","Month"] ,how="left")

# Combine result
pd.concat([old_code_result,new_code_result])
    

You need to have the month and code number on both sides when merging, so:

  1. Create a year-month column in the invoice dataframe that is consistent with the cost table
  2. Combine two cost tables
  3. Merge with new code and old code respectively


import pandas as pd
import io
import datetime

invoice_data_text = '''Old Code New Code    Name    Invoice Date
1001011 NA  Cheese Cake 02/02/2021
1001012 NA  Coffee  03/05/2021
1001011 NA  Cheese Cake 30/05/2021
NA  2002093 Jasmine Tea 21/08/2021
NA  2002042 Cookies 31/12/2021
NA  2002080 Coffee  09/01/2022
NA  2002093 Jasmine Tea 05/05/2022
NA  2002058 Cheese Cake 07/06/2022
'''

cost_2021_text = '''
Old Code    New Code    Name    Jan-21  Feb-21  Mar-21  Apr-21  May-21  June-21 Jul-21  Aug-21  Sep-21  Oct-21  Nov-21  Dec-21
1001011 2002058 Cheese Cake 50  51  50  53  54  52  55  53  50  52  53  53
1001012 2002080 Coffee  5   6   5   6   6   5   7   5   6   5   6   6
1001015 2002093 Jasmine Tea 4   3   3   4   4   3   5   3   3   3   3   4
1001020 2002042 Cookies 20  20  21  20  22  20  21  20  22  20  21  22
'''

cost_2022_text = '''
Old Code    New Code    Name    Jan-22  Feb-22  Mar-22  Apr-22  May-22  June-22 Jul-22  Aug-22  Sep-22  Oct-22  Nov-22  Dec-22
1001011 2002058 Cheese Cake 52  52  55  55  56  52  NA  NA  NA  NA  NA  NA
1001012 2002080 Coffee  5   6   5   6   6   6.5 NA  NA  NA  NA  NA  NA
1001015 2002093 Jasmine Tea 4   3   3   5   5   5.5 NA  NA  NA  NA  NA  NA
1001020 2002042 Cookies 22  22  23  23  23.5    23  NA  NA  NA  NA  NA  NA
'''

# Prepare
invoice_df = pd.read_csv(io.StringIO(invoice_data_text),sep="\t",parse_dates=["Invoice Date"])
cost21 = pd.read_csv(io.StringIO(cost_2021_text),sep='\t')
cost22 =  pd.read_csv(io.StringIO(cost_2022_text),sep='\t')

# Create Month column for merging
invoice_df["Month"] = invoice_df["Invoice Date"].map(lambda x:datetime.datetime.strftime(x,"%b-%y"))

# Combine two cost tables
cost21_stack = cost21.set_index(list(cost21.columns[:3])).stack().reset_index(name="Cost")
cost22_stack = cost22.set_index(list(cost22.columns[:3])).stack().reset_index(name="Cost")
cost_table = pd.concat([cost21_stack,cost22_stack]).rename({"level_3":"Month"},axis=1)

# Merge with new code and old code respectively
old_code_result = pd.merge(invoice_df[pd.isna(invoice_df["Old Code"]) == False], cost_table[["Old Code","Month","Cost"]], on=["Old Code","Month"] ,how="left")
new_code_result = pd.merge(invoice_df[pd.isna(invoice_df["New Code"]) == False], cost_table[["New Code","Month","Cost"]], on=["New Code","Month"] ,how="left")

# Combine result
pd.concat([old_code_result,new_code_result])
    

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文