如何在两个不同的Excel表中搜索两个列中的相等产品编号，并从匹配行到新表格复制某些单元

发布于 2025-02-07 22:24:24 字数 1782 浏览 1 评论 0原文

我有两个Excel表：

old_data.xlsx
产品编号名称当前价格其他列
1000产品名称1 10
AB23104产品名称2 5
430267产品名称3 20
new_data.xlsx
产品编号名称新价格其他列
AB23104更名为产品名称2 20
1000重命名的产品名称1 5
345LKT10023产品名称4 100

预期结果：下表 + 2个反馈消息

消息1）产品ID 430267在新数据文件中缺少

消息2 ）产品ID 345LKT10023是新添加的

    Product ID        Name of product    New price  Old price

    AB23104           Product name 2     20         5

    1000              Product name 1     5          10

    345LKT10023       Product name 4     100        100

我现在有此代码，但是由于我缺乏知识，它尚未工作且无法完成：

import openpyxl
import pandas as pd

new_datacols = [0, 1, 2]
old_datacols = [0, 1, 2]

new_data = pd.read_excel('new_data.xlsx', skiprows=1, usecols=new_datacols, index_col=0)
old_data = pd.read_excel('old_data.xlsx', skiprows=1, usecols=old_datacols, index_col=0)

def format_data():
    # combine_type = inner, left, right, outer
    df = pd.merge(new_data, old_data, on='Product number', how='outer')

    df = df.rename(columns={"Product number": "Product ID",
                        "Name": "Name of product",
                        "Current price": "Old price"})

    nan_value = float("NaN")

    df.replace("", nan_value, inplace=True)
    df.dropna(subset=["Name of product"], inplace=True)

    df = df[['Product ID', 'Name of product',
             'New price', 'Old price']]

    print(df.columns)

    # df.to_excel('updated_table.xlsx')


if __name__ == "__main__":
    format_data()

原文

I have two excel tables:

old_data.xlsx
Product number Name Current price Other columns
1000 Product name 1 10
AB23104 Product name 2 5
430267 Product name 3 20
new_data.xlsx
Product number Name New price Other columns
AB23104 Renamed product name 2 20
1000 Renamed product name 1 5
345LKT10023 Product name 4 100

Expected result: table below + 2 feedback messages somewhere

Message 1) Product ID 430267 is missing in new data file

Message 2) Product ID 345LKT10023 is newly added

    Product ID        Name of product    New price  Old price

    AB23104           Product name 2     20         5

    1000              Product name 1     5          10

    345LKT10023       Product name 4     100        100

I have this code for now, but it is not working and not finished due to lack of knowledge on my part:

import openpyxl
import pandas as pd

new_datacols = [0, 1, 2]
old_datacols = [0, 1, 2]

new_data = pd.read_excel('new_data.xlsx', skiprows=1, usecols=new_datacols, index_col=0)
old_data = pd.read_excel('old_data.xlsx', skiprows=1, usecols=old_datacols, index_col=0)

def format_data():
    # combine_type = inner, left, right, outer
    df = pd.merge(new_data, old_data, on='Product number', how='outer')

    df = df.rename(columns={"Product number": "Product ID",
                        "Name": "Name of product",
                        "Current price": "Old price"})

    nan_value = float("NaN")

    df.replace("", nan_value, inplace=True)
    df.dropna(subset=["Name of product"], inplace=True)

    df = df[['Product ID', 'Name of product',
             'New price', 'Old price']]

    print(df.columns)

    # df.to_excel('updated_table.xlsx')


if __name__ == "__main__":
    format_data()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

三岁铭 2025-02-14 22:24:24

这是我的尝试。它将消息放在同一文件中的另一张纸上。最后的电子表格看起来像这样：

import os
import pandas as pd

old_data_filename = r"old_data.xlsx"
new_data_filename = r"new_data.xlsx"
new_spreadsheet_filename = r"updated_products.xlsx"

# Load spreadsheets into a dataframe and set their indexes to "Product number"
old_data_df = pd.read_excel(old_data_filename).set_index("Product number")
new_data_df = pd.read_excel(new_data_filename).set_index("Product number")

# Determine which products are new/missing, and store the corresponding 
# messages in a list, which will be written to its own spreadsheet at the end
old_data_products = set(old_data_df.index)
new_data_products = set(new_data_df.index)

new_products = new_data_products - old_data_products
missing_products = old_data_products - new_data_products

messages = [f"Product ID {product} is missing in new data file" for product in missing_products]
messages.extend(f"Product ID {product} is newly added" for product in new_products)
messages = [f"Message {i}) {message}" for i, message in enumerate(messages, start=1)]

# Keep the original product names
new_data_df.update(old_data_df["Name"])

# Old price is the same as new price unless the product is in old_data_df, in which 
# case it is old_data_df["Current price"]
new_data_df["Old price"] = new_data_df["New price"]
new_data_df["Old price"].update(old_data_df["Current price"])

# Rename the columns
new_data_df.reset_index(inplace=True)
new_data_df.rename(columns={"Product number": "Product ID", 
                            "Name": "Name of product"}, inplace=True)

# Remove all other columns except the ones we want
new_data_df = new_data_df[["Product ID", 
                           "Name of product", 
                           "New price", "Old price"]]

# Write the new products and messages to separate sheets in the same file
with pd.ExcelWriter(new_spreadsheet_filename) as writer:
    new_data_df.to_excel(writer, "Products", index=False)
    pd.DataFrame({"Messages": messages}).to_excel(writer, "Messages", index=False)

# Launch the new spreadsheet
os.startfile(new_spreadsheet_filename)

编辑：与实际电子表格一起使用的代码：

import os
import pandas as pd

old_data_filename = r"old_data.xlsx"
new_data_filename = r"new_data.xlsx"
new_spreadsheet_filename = r"updated_products.xlsx"

# Load spreadsheets into a dataframe and set their indexes to "Product number"
old_data_df = pd.read_excel(old_data_filename).set_index("Product ID")
new_data_df = pd.read_excel(new_data_filename).set_index("Product ID")

# Remove duplicated indexes for both the dataframes, keeping only the first occurrence
old_data_df = old_data_df[~old_data_df.index.duplicated()]
new_data_df = new_data_df[~new_data_df.index.duplicated()]

# Determine which products are new/missing, and store the corresponding 
# messages in a list, which will be written to its own spreadsheet at the end
old_data_products = set(old_data_df.index)
new_data_products = set(new_data_df.index)

new_products = new_data_products - old_data_products
missing_products = old_data_products - new_data_products

messages = [f"Product ID {product} is missing in new data file" for product in missing_products]
messages.extend(f"Product ID {product} is newly added" for product in new_products)
messages = [f"Message {i}) {message}" for i, message in enumerate(messages, start=1)]

# Keep the original product names
new_data_df.update(old_data_df["Name"])

# Old price is the same as new price unless the product is in old_data_df, in which 
# case it is old_data_df["Current price"]
new_data_df["Old price"] = new_data_df["New price"]
new_data_df["Old price"].update(old_data_df["Current price"])

# Rename the "Name" column to "Name of product"
new_data_df.rename(columns={"Name": "Name of product"}, inplace=True)

# Remove all other columns except the ones we want
new_data_df.reset_index(inplace=True)
new_data_df = new_data_df[["Product ID", 
                            "Name of product", 
                            "New price", "Old price"]]

# Write the new products and messages to separate sheets in the same file
with pd.ExcelWriter(new_spreadsheet_filename) as writer:
    new_data_df.to_excel(writer, "Products", index=False)
    pd.DataFrame({"Messages": messages}).to_excel(writer, "Messages", index=False)

# Launch the new spreadsheet
os.startfile(new_spreadsheet_filename)

This is my attempt. It puts the messages in another sheet in the same file. The final spreadsheet looks like this:

import os
import pandas as pd

old_data_filename = r"old_data.xlsx"
new_data_filename = r"new_data.xlsx"
new_spreadsheet_filename = r"updated_products.xlsx"

# Load spreadsheets into a dataframe and set their indexes to "Product number"
old_data_df = pd.read_excel(old_data_filename).set_index("Product number")
new_data_df = pd.read_excel(new_data_filename).set_index("Product number")

# Determine which products are new/missing, and store the corresponding 
# messages in a list, which will be written to its own spreadsheet at the end
old_data_products = set(old_data_df.index)
new_data_products = set(new_data_df.index)

new_products = new_data_products - old_data_products
missing_products = old_data_products - new_data_products

messages = [f"Product ID {product} is missing in new data file" for product in missing_products]
messages.extend(f"Product ID {product} is newly added" for product in new_products)
messages = [f"Message {i}) {message}" for i, message in enumerate(messages, start=1)]

# Keep the original product names
new_data_df.update(old_data_df["Name"])

# Old price is the same as new price unless the product is in old_data_df, in which 
# case it is old_data_df["Current price"]
new_data_df["Old price"] = new_data_df["New price"]
new_data_df["Old price"].update(old_data_df["Current price"])

# Rename the columns
new_data_df.reset_index(inplace=True)
new_data_df.rename(columns={"Product number": "Product ID", 
                            "Name": "Name of product"}, inplace=True)

# Remove all other columns except the ones we want
new_data_df = new_data_df[["Product ID", 
                           "Name of product", 
                           "New price", "Old price"]]

# Write the new products and messages to separate sheets in the same file
with pd.ExcelWriter(new_spreadsheet_filename) as writer:
    new_data_df.to_excel(writer, "Products", index=False)
    pd.DataFrame({"Messages": messages}).to_excel(writer, "Messages", index=False)

# Launch the new spreadsheet
os.startfile(new_spreadsheet_filename)

EDIT: Code that works with the actual spreadsheets:

import os
import pandas as pd

old_data_filename = r"old_data.xlsx"
new_data_filename = r"new_data.xlsx"
new_spreadsheet_filename = r"updated_products.xlsx"

# Load spreadsheets into a dataframe and set their indexes to "Product number"
old_data_df = pd.read_excel(old_data_filename).set_index("Product ID")
new_data_df = pd.read_excel(new_data_filename).set_index("Product ID")

# Remove duplicated indexes for both the dataframes, keeping only the first occurrence
old_data_df = old_data_df[~old_data_df.index.duplicated()]
new_data_df = new_data_df[~new_data_df.index.duplicated()]

# Determine which products are new/missing, and store the corresponding 
# messages in a list, which will be written to its own spreadsheet at the end
old_data_products = set(old_data_df.index)
new_data_products = set(new_data_df.index)

new_products = new_data_products - old_data_products
missing_products = old_data_products - new_data_products

messages = [f"Product ID {product} is missing in new data file" for product in missing_products]
messages.extend(f"Product ID {product} is newly added" for product in new_products)
messages = [f"Message {i}) {message}" for i, message in enumerate(messages, start=1)]

# Keep the original product names
new_data_df.update(old_data_df["Name"])

# Old price is the same as new price unless the product is in old_data_df, in which 
# case it is old_data_df["Current price"]
new_data_df["Old price"] = new_data_df["New price"]
new_data_df["Old price"].update(old_data_df["Current price"])

# Rename the "Name" column to "Name of product"
new_data_df.rename(columns={"Name": "Name of product"}, inplace=True)

# Remove all other columns except the ones we want
new_data_df.reset_index(inplace=True)
new_data_df = new_data_df[["Product ID", 
                            "Name of product", 
                            "New price", "Old price"]]

# Write the new products and messages to separate sheets in the same file
with pd.ExcelWriter(new_spreadsheet_filename) as writer:
    new_data_df.to_excel(writer, "Products", index=False)
    pd.DataFrame({"Messages": messages}).to_excel(writer, "Messages", index=False)

# Launch the new spreadsheet
os.startfile(new_spreadsheet_filename)

回复收藏 0 原文

~没有更多了~