比较两个Excel文件并使用Python打印差异？

发布于 2025-02-06 06:13:34 字数 654 浏览 2 评论 0原文

我有2个XLSX文件，需要打印文件中每个单元格中的差异。我现在使用的代码正在工作，但我需要忽略每个XLSX文件中的第一列，我不确定如何将该异常添加到我当前使用的代码中。

ds1 = xlrd.open_workbook("PATH1")
ds2 = xlrd.open_workbook("PATH2")
SHEET1 = ds1.sheet_by_index(0)
SHEET1 = ds2.sheet_by_index(0)

for rownum in range(max(POB_ds1.nrows, POB_ds2.nrows)):
if rownum < SHEET1_ds1.nrows:
    row_rb1 = SHEET1_ds1.row_values(rownum)
    row_rb2 = SHEET1_ds2.row_values(rownum)

    for colnum, (c1, c2) in enumerate(zip_longest(row_rb1, row_rb2)):
        if c1 != c2:
            print ("Row {} Col {} - {} != {}".format(rownum+1, colnum+1, c1, c2))
    else:
    print ("Row {} missing".format(rownum+1))

原文

I have 2 xlsx files and need to print the differences in each cell in the file. The code that I am using now is working but I need to ignore the first column in each of the xlsx files and I am not sure how to add that exception to the code I am currently using.

ds1 = xlrd.open_workbook("PATH1")
ds2 = xlrd.open_workbook("PATH2")
SHEET1 = ds1.sheet_by_index(0)
SHEET1 = ds2.sheet_by_index(0)

for rownum in range(max(POB_ds1.nrows, POB_ds2.nrows)):
if rownum < SHEET1_ds1.nrows:
    row_rb1 = SHEET1_ds1.row_values(rownum)
    row_rb2 = SHEET1_ds2.row_values(rownum)

    for colnum, (c1, c2) in enumerate(zip_longest(row_rb1, row_rb2)):
        if c1 != c2:
            print ("Row {} Col {} - {} != {}".format(rownum+1, colnum+1, c1, c2))
    else:
    print ("Row {} missing".format(rownum+1))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

救赎№ 2025-02-13 06:13:34

这种方法怎么样？

import pandas as pd
import numpy as np

# Next, read in both of our excel files into dataframes
# Showing examples of several parameters...in your case maybe not all parameters are necessary
df1 = pd.read_excel('C:\\Users\\Excel\\Desktop\\Coding\\Python\\Excel\\Compare Two Excel Files\\Book1.xlsx', 'Sheet1', na_values=['NA'], header=0,  skiprows=0, nrows=1000, usecols="B:Z")
df2 = pd.read_excel('C:\\Users\\Excel\\Desktop\\Coding\\Python\\Excel\\Compare Two Excel Files\\Book2.xlsx', 'Sheet1', na_values=['NA'], header=0,  skiprows=0, nrows=1000, usecols="B:Z")

# Order by account number and reindex so that it stays this way.


df1.sort_index(by=["H1"])
df1=df1.reindex()
df2.sort_index(by=["H1"])
df2=df2.reindex()

# Create a diff function to show what the changes are.

def report_diff(x):
    return x[0] if x[0] == x[1] else '{} ---> {}'.format(*x)

# Merge the two datasets together in a Panel . I will admit that I haven’t fully grokked the panel concept yet but the only way to learn is to keep pressing on!

diff_panel = pd.Panel(dict(df1=df1,df2=df2))

# Once the data is in a panel, we use the report_diff function to highlight all the changes. I think this is a very intuitive way (for this data set) to show changes. It is relatively simple to see what the old value is and the new one. For example, someone could easily check and see why that postal code changed for account number 880043.

diff_output = diff_panel.apply(report_diff, axis=0)
diff_output.tail()


# One of the things we want to do is flag rows that have changes so it is easier to see the changes. We will create a has_change function and use apply to run the function against each row.

def has_change(row):
    if "--->" in row.to_string():
        return "Y"
    else:
        return "N"


diff_output['has_change'] = diff_output.apply(has_change, axis=1)
diff_output.tail()

# It is simple to show all the columns with a change:

diff_output[(diff_output.has_change == 'Y')]


# Finally, let’s write it out to an Excel file:

diff_output[(diff_output.has_change == 'Y')].to_excel('C:\\Users\\Excel\\Desktop\\Coding\\Python\\Excel\\Compare Two Excel Files\\diff.xlsx')

有关所有详细信息，请参见下面的链接。

https://pbpython.com/excel-diff-pandas.htas.html

How about this approach?

import pandas as pd
import numpy as np

# Next, read in both of our excel files into dataframes
# Showing examples of several parameters...in your case maybe not all parameters are necessary
df1 = pd.read_excel('C:\\Users\\Excel\\Desktop\\Coding\\Python\\Excel\\Compare Two Excel Files\\Book1.xlsx', 'Sheet1', na_values=['NA'], header=0,  skiprows=0, nrows=1000, usecols="B:Z")
df2 = pd.read_excel('C:\\Users\\Excel\\Desktop\\Coding\\Python\\Excel\\Compare Two Excel Files\\Book2.xlsx', 'Sheet1', na_values=['NA'], header=0,  skiprows=0, nrows=1000, usecols="B:Z")

# Order by account number and reindex so that it stays this way.


df1.sort_index(by=["H1"])
df1=df1.reindex()
df2.sort_index(by=["H1"])
df2=df2.reindex()

# Create a diff function to show what the changes are.

def report_diff(x):
    return x[0] if x[0] == x[1] else '{} ---> {}'.format(*x)

# Merge the two datasets together in a Panel . I will admit that I haven’t fully grokked the panel concept yet but the only way to learn is to keep pressing on!

diff_panel = pd.Panel(dict(df1=df1,df2=df2))

# Once the data is in a panel, we use the report_diff function to highlight all the changes. I think this is a very intuitive way (for this data set) to show changes. It is relatively simple to see what the old value is and the new one. For example, someone could easily check and see why that postal code changed for account number 880043.

diff_output = diff_panel.apply(report_diff, axis=0)
diff_output.tail()


# One of the things we want to do is flag rows that have changes so it is easier to see the changes. We will create a has_change function and use apply to run the function against each row.

def has_change(row):
    if "--->" in row.to_string():
        return "Y"
    else:
        return "N"


diff_output['has_change'] = diff_output.apply(has_change, axis=1)
diff_output.tail()

# It is simple to show all the columns with a change:

diff_output[(diff_output.has_change == 'Y')]


# Finally, let’s write it out to an Excel file:

diff_output[(diff_output.has_change == 'Y')].to_excel('C:\\Users\\Excel\\Desktop\\Coding\\Python\\Excel\\Compare Two Excel Files\\diff.xlsx')

See the link below for all details.

https://pbpython.com/excel-diff-pandas.html

回复收藏 0 原文

~没有更多了~