Excel 比较两个 csv 文件并显示差异

发布于 2024-11-12 01:37:34 字数 481 浏览 0 评论 0原文

我正在比较两组大的 csv 文件和/或 csv 文件和 .txt 文件。我“认为”.txt 文件可能需要转换为 csv 文件只是为了简单起见,但这可能需要也可能不需要。我想使用 excel、c++ 或 python。我需要将一个“接受的”值列表与测量的列表进行比较,并找出它们之间的差异(如果有)。 Excel 可能是最简单的方法,但 Python 或 C++ 也可能同样有效。这不是家庭作业,所以不用担心这类事情。非常感谢代码建议和/或模板。或网站链接

编辑1

我已经阅读过有关Python的difflib或不同类的信息,但不熟悉如何使用它,并且可能比我想要的更多。

编辑2

这两个文件都有一系列列(它们之间没有画线或任何东西),并且在这些“命名”列下面会有数字。我需要将文件 1 中第 1 列第 1 列中的数字与文件 2 中第 1 列第 1 列中的数字进行比较,如果存在差异,请显示另一个 csv 文件中的差异

I'm looking to compare two big sets of csv files and/or a csv file and a .txt file. I "think" the .txt file may need to be converted to a csv file just for simplicity sake but that may or may not be needed. I either want to use excel, c++, or python. I need to compare one "accepted" value list to a list that is measured and find the difference between them if there is one. Excel may be the easiest way to do this but python or c++ may work just as well. This is not homework so don't worry about that sort of thing. Code advice and/or templates is greatly appreciated. or links to websites

EDIT 1

I've read about Python's difflib or differ class but unfamiliar how to use it and may be more than I want.

EDIT 2

The Files both will have a series of columns(not with lines drawn between them or anything) and below those "named" columns there will be numbers. I need to compare the number in column 1 spot one in file one to column 1 spot one of file 2 and if there is a difference show the difference in another csv file

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

雪花飘飘的天空 2024-11-19 01:37:34

您可以使用 ADO(ODBC/JET/OLEDB 文本驱动程序)将“合适的”.txt/.csv/.tab/.flr 文件视为 SQL 数据库中来自每种支持 COM 的语言的表。然后可以使用 SQL 的强大功能来完成比较(DISTINCT、GROUP、(LEFT)JOINS,...)。

添加关于您的评论:

这是您的问题,我不想把您推到您不想去的地方。但如果您需要比较表格数据,SQL 是一个很好的(最好的?)工具。作为发现两个 .txt 文件中差异的脚本输出的证据:

======= The .txt files to play with
------- file1.txt
"AC";"AM"
40000;-19083,00
40100;20000,00
40200;350004,00
40300;3498,99

------- file2.txt
"AC";"AM"
40000;-19083,00
40300;3498,99
40105;-234567,00
40200;350,00

======= Some diagnostic SQL
------- <NULL> indicates: In F1 but not in F2 (LEFT JOIN)
SELECT T1.AC, T1.AM, T2.AM FROM [file1.txt] AS T1 LEFT JOIN [file2.txt] AS T2 ON (T1.AC =
T2.AC)
------- Result
AC      File1   File2
40000   -19083  -19083
40100   20000   <NULL>
40200   350004  350
40300   3498,99 3498,99

------- <NULL> indicates: Not in the other file (LEFT JOIN, UNION)
SELECT T1.AC, T1.AM, T2.AM FROM [file1.txt] AS T1 LEFT JOIN [file2.txt] AS T2 ON (T1.AC =
T2.AC) UNION SELECT T2.AC, T1.AM, T2.AM FROM [file2.txt] AS T2 LEFT JOIN [file1.txt] AS T1
 ON (T1.AC = T2.AC)
------- Result
AC      File1   File2
40000   -19083  -19083
40100   20000   <NULL>
40105   <NULL>  -234567
40200   350004  350
40300   3498,99 3498,99

------- the problems: missing, different values
SELECT T1.AC, T1.AM, T2.AM FROM [file1.txt] AS T1 LEFT JOIN [file2.txt] AS T2 ON (T1.AC =
T2.AC) WHERE T2.AM IS NULL OR T1.AM <> T2.AM UNION SELECT T2.AC, T1.AM, T2.AM FROM [file2.
txt] AS T2 LEFT JOIN [file1.txt] AS T1 ON (T1.AC = T2.AC) WHERE T1.AM IS NULL OR T1.AM <>
T2.AM
------- Result
AC      File1   File2
40100   20000   <NULL>
40105   <NULL>  -234567
40200   350004  350

进一步添加:

文章涉及 ADO 和文本文件;查找文件 adoNNN.chm
(NNN=版本号,例如210)在您的计算机上;这是一本好书
阿多。

您可以使用 Access 或 OpenOffice Base 来试验 SQL 语句
应用于链接/引用(未导入!)文本数据库。

掌握了最初的障碍后,脚本/程序就会很容易:连接
到数据库,即包含文件和 schema.ini 的文件夹
file 来定义 files=tables 的结构。

上面的输出是通过以下方式生成的:

  Const adClipString = 2

  Dim oFS  : Set oFS = CreateObject( "Scripting.FileSystemObject" )
  Dim sDir : sDir    = oFS.GetAbsolutePathName( ".\txt" )
  Dim oDB  : Set oDb = CreateObject( "ADODB.Connection" )
  oDB.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & sDir & ";Extended Properties=""text"""
  Dim sSQL
  Dim sFiNa
  WScript.Echo "=======", "The .txt files to play with"
  For Each sFiNa In Array( "file1.txt", "file2.txt"  )
      WScript.Echo "-------", sFiNa
      WScript.Echo oFS.OpenTextFile( "txt\" & sFiNa ).ReadAll()
  Next

  WScript.Echo "=======", "Some diagnostic SQL"
  Dim aSQL
  For Each aSQL In Array( _
       Array(   "<NULL> indicates: In F1 but not in F2 (LEFT JOIN)" _
              , Join( Array( _
                     "SELECT T1.AC, T1.AM, T2.AM FROM" _
                   , "[file1.txt] AS T1" _
                   , "LEFT JOIN [file2.txt] AS T2 ON (T1.AC = T2.AC)" _
                ), " " ) ) _
     , Array(   "<NULL> indicates: Not in the other file (LEFT JOIN, UNION)" _
              , Join( Array( _
                     "SELECT T1.AC, T1.AM, T2.AM FROM" _
                   , "[file1.txt] AS T1" _
                   , "LEFT JOIN [file2.txt] AS T2 ON (T1.AC = T2.AC)" _
                   , "UNION" _
                   , "SELECT T2.AC, T1.AM, T2.AM FROM" _
                   , "[file2.txt] AS T2" _
                   , "LEFT JOIN [file1.txt] AS T1 ON (T1.AC = T2.AC)" _
                ), " " ) ) _
     , Array(   "the problems: missing, different value" _
              , Join( Array( _
                     "SELECT T1.AC, T1.AM, T2.AM FROM" _
                   , "[file1.txt] AS T1" _
                   , "LEFT JOIN [file2.txt] AS T2 ON (T1.AC = T2.AC)" _
                   , "WHERE T2.AM IS NULL OR T1.AM <> T2.AM" _
                   , "UNION" _
                   , "SELECT T2.AC, T1.AM, T2.AM FROM" _
                   , "[file2.txt] AS T2" _
                   , "LEFT JOIN [file1.txt] AS T1 ON (T1.AC = T2.AC)" _
                   , "WHERE T1.AM IS NULL OR T1.AM <> T2.AM" _
                ), " " ) ) _
     )
     sSQL = aSQL( 1 )
     WScript.Echo "-------", aSQL( 0 )
     WScript.Echo sSQL
     Dim oRS : Set oRS = oDB.Execute( sSQL )
     WScript.Echo "------- Result"
     WScript.Echo Join( Array( "AC", "File1", "File2" ), vbTab )
     WScript.Echo oRS.GetString( adClipString, , vbTab, vbCrLf, "<NULL>" )
  Next
  oDB.Close

如果删除/忽略脂肪(创建 SQL 语句、诊断输出),它会沸腾
减少至 6 行

  Dim oDB  : Set oDb = CreateObject( "ADODB.Connection" )
  oDB.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & sDir & ";Extended Properties=""text"""
  sSQL = "..."
  Dim oRS : Set oRS = oDB.Execute( sSQL )
  WScript.Echo oRS.GetString( adClipString, , vbTab, vbCrLf, "<NULL>" )
  oDB.Close

,可以轻松“移植”到每种支持 COM 的语言,因为 ADO
物体承担了所有繁重的工作。当您
想要保存结果集:只需旋转分隔符/分隔符/空参数
并将其转储到文件

  oFS.CreateTextFile( ... ).WriteLine oRS.GetString( _
    adClipString, , ",", vbCrLf, ""
  )

(不要忘记将该表的定义添加到您的 schema.ini 中)。的
当然你也可以使用“SELECT/INSERT INTO”,但这样的语句可能不会
很容易正确/通过 ADO 文本驱动程序的解析器。

添加计算:

从 5 x 2 主/批准文件开始,其中包含:

Num0    Num1    Num2    Num3    Num4
7,6     6,1     3,8     0,9     8,9
0,9     9,4     4,7     8,8     9,9

通过附加 Spot 列将其转换为预期.txt

Num0    Num1    Num2    Num3    Num4    Spot
7,6     6,1     3,8     0,9     8,9     1
0,9     9,4     4,7     8,8     9,9     2

,使其符合

[expected.txt]
ColNameHeader=True
CharacterSet=1252
Format=Delimited(;)
Col1=Num0 Float
Col2=Num1 Float
Col3=Num2 Float
Col4=Num3 Float
Col5=Num4 Float
Col6=Spot Integer

schema.ini 文件中的要求。类似地,转换一个测量文件,如:

Num0    Num1    Num2    Num3    Num4
7,1     1,1     3,8     0,9     8,9
0,9     9,4     4,7     8,8     9,9

到measured.txt

Num0    Num1    Num2    Num3    Num4    Spot
7,1     1,1     3,8     0,9     8,9     1
0,9     9,4     4,7     8,8     9,9     2

Apply

  sSQL = Join( Array( _
         "SELECT E.Num0 - M.Num0 AS Dif0" _
      ,       ", E.Num1 - M.Num1 AS Dif1" _
      ,       ", E.Num2 - M.Num2 AS Dif2" _
      ,       ", E.Num3 - M.Num3 AS Dif3" _
      ,       ", E.Num4 - M.Num4 AS Dif4" _
      ,       ", E.Spot          AS Spot" _
      ,  "FROM [expected.txt] AS E" _
      ,  "INNER JOIN [measured.txt] AS M" _
      ,  "ON E.Spot = M.Spot" _
  ), " " )

将结果集写入differences.txt

aFNames = Array( "Num0", ... "Spot" )
oFS.CreateTextFile( sFSpec ).Write _
Join( aFNames, sFSep ) & sRSep& oRS.GetString( adClipString, , sFSep, sRSep, "" )

你会得到:

Num0    Num1    Num2    Num3    Num4    Spot
0,5     5       0       0       0       1
0       0       0       0       0       2

You can use ADO (ODBC/JET/OLEDB Text Driver) to treat 'decent' .txt/.csv/.tab/.flr files as tables in a SQL Database from every COM-enabled language. Then the comparisons could be done using the power of SQL (DISTINCT, GROUP, (LEFT) JOINS, ...).

Added with regard to your comment:

It's your problem and I don't want to push you where you don't want to go. But SQL is a good (the best?) tool, if you need to compare tabular data. As evidence the output of a script that spots the differences in two .txt files:

======= The .txt files to play with
------- file1.txt
"AC";"AM"
40000;-19083,00
40100;20000,00
40200;350004,00
40300;3498,99

------- file2.txt
"AC";"AM"
40000;-19083,00
40300;3498,99
40105;-234567,00
40200;350,00

======= Some diagnostic SQL
------- <NULL> indicates: In F1 but not in F2 (LEFT JOIN)
SELECT T1.AC, T1.AM, T2.AM FROM [file1.txt] AS T1 LEFT JOIN [file2.txt] AS T2 ON (T1.AC =
T2.AC)
------- Result
AC      File1   File2
40000   -19083  -19083
40100   20000   <NULL>
40200   350004  350
40300   3498,99 3498,99

------- <NULL> indicates: Not in the other file (LEFT JOIN, UNION)
SELECT T1.AC, T1.AM, T2.AM FROM [file1.txt] AS T1 LEFT JOIN [file2.txt] AS T2 ON (T1.AC =
T2.AC) UNION SELECT T2.AC, T1.AM, T2.AM FROM [file2.txt] AS T2 LEFT JOIN [file1.txt] AS T1
 ON (T1.AC = T2.AC)
------- Result
AC      File1   File2
40000   -19083  -19083
40100   20000   <NULL>
40105   <NULL>  -234567
40200   350004  350
40300   3498,99 3498,99

------- the problems: missing, different values
SELECT T1.AC, T1.AM, T2.AM FROM [file1.txt] AS T1 LEFT JOIN [file2.txt] AS T2 ON (T1.AC =
T2.AC) WHERE T2.AM IS NULL OR T1.AM <> T2.AM UNION SELECT T2.AC, T1.AM, T2.AM FROM [file2.
txt] AS T2 LEFT JOIN [file1.txt] AS T1 ON (T1.AC = T2.AC) WHERE T1.AM IS NULL OR T1.AM <>
T2.AM
------- Result
AC      File1   File2
40100   20000   <NULL>
40105   <NULL>  -234567
40200   350004  350

Further additions:

This article deals with ADO and text files; look for a file adoNNN.chm
(NNN=Version number, e.g. 210) on your computer; this is a good book about
ADO.

You can use Access or OpenOffice Base to experiment with SQL statements
applied to a linked/referenced (not imported!) text database.

A script/program will be easy after you mastered the initial hurdle: connecting
to the the database, i.e. to a folder containing the files and a schema.ini
file to define the structure of the files=tables.

The output above was generated by:

  Const adClipString = 2

  Dim oFS  : Set oFS = CreateObject( "Scripting.FileSystemObject" )
  Dim sDir : sDir    = oFS.GetAbsolutePathName( ".\txt" )
  Dim oDB  : Set oDb = CreateObject( "ADODB.Connection" )
  oDB.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & sDir & ";Extended Properties=""text"""
  Dim sSQL
  Dim sFiNa
  WScript.Echo "=======", "The .txt files to play with"
  For Each sFiNa In Array( "file1.txt", "file2.txt"  )
      WScript.Echo "-------", sFiNa
      WScript.Echo oFS.OpenTextFile( "txt\" & sFiNa ).ReadAll()
  Next

  WScript.Echo "=======", "Some diagnostic SQL"
  Dim aSQL
  For Each aSQL In Array( _
       Array(   "<NULL> indicates: In F1 but not in F2 (LEFT JOIN)" _
              , Join( Array( _
                     "SELECT T1.AC, T1.AM, T2.AM FROM" _
                   , "[file1.txt] AS T1" _
                   , "LEFT JOIN [file2.txt] AS T2 ON (T1.AC = T2.AC)" _
                ), " " ) ) _
     , Array(   "<NULL> indicates: Not in the other file (LEFT JOIN, UNION)" _
              , Join( Array( _
                     "SELECT T1.AC, T1.AM, T2.AM FROM" _
                   , "[file1.txt] AS T1" _
                   , "LEFT JOIN [file2.txt] AS T2 ON (T1.AC = T2.AC)" _
                   , "UNION" _
                   , "SELECT T2.AC, T1.AM, T2.AM FROM" _
                   , "[file2.txt] AS T2" _
                   , "LEFT JOIN [file1.txt] AS T1 ON (T1.AC = T2.AC)" _
                ), " " ) ) _
     , Array(   "the problems: missing, different value" _
              , Join( Array( _
                     "SELECT T1.AC, T1.AM, T2.AM FROM" _
                   , "[file1.txt] AS T1" _
                   , "LEFT JOIN [file2.txt] AS T2 ON (T1.AC = T2.AC)" _
                   , "WHERE T2.AM IS NULL OR T1.AM <> T2.AM" _
                   , "UNION" _
                   , "SELECT T2.AC, T1.AM, T2.AM FROM" _
                   , "[file2.txt] AS T2" _
                   , "LEFT JOIN [file1.txt] AS T1 ON (T1.AC = T2.AC)" _
                   , "WHERE T1.AM IS NULL OR T1.AM <> T2.AM" _
                ), " " ) ) _
     )
     sSQL = aSQL( 1 )
     WScript.Echo "-------", aSQL( 0 )
     WScript.Echo sSQL
     Dim oRS : Set oRS = oDB.Execute( sSQL )
     WScript.Echo "------- Result"
     WScript.Echo Join( Array( "AC", "File1", "File2" ), vbTab )
     WScript.Echo oRS.GetString( adClipString, , vbTab, vbCrLf, "<NULL>" )
  Next
  oDB.Close

If you delete/ignore the fat (create SQL statements, diagnostics output), it boils
down to 6 lines

  Dim oDB  : Set oDb = CreateObject( "ADODB.Connection" )
  oDB.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & sDir & ";Extended Properties=""text"""
  sSQL = "..."
  Dim oRS : Set oRS = oDB.Execute( sSQL )
  WScript.Echo oRS.GetString( adClipString, , vbTab, vbCrLf, "<NULL>" )
  oDB.Close

which can be 'ported' easily to every COM-enabled language, because the ADO
objects do all the heavy lifting. The .GetString method comes handy, when you
want to save a resultset: just twiddle the separator/delimiter/Null arguments
and dump it to file

  oFS.CreateTextFile( ... ).WriteLine oRS.GetString( _
    adClipString, , ",", vbCrLf, ""
  )

(don't forget to add a definition for that table to your schema.ini). Of
course you also can use a "SELECT/INSERT INTO", but such statements may not
be easy to get right/passed the ADO Text Driver's parser.

Addition wrt Computations:

Start with a 5 x 2 master/approved file containing:

Num0    Num1    Num2    Num3    Num4
7,6     6,1     3,8     0,9     8,9
0,9     9,4     4,7     8,8     9,9

transform it to expected.txt

Num0    Num1    Num2    Num3    Num4    Spot
7,6     6,1     3,8     0,9     8,9     1
0,9     9,4     4,7     8,8     9,9     2

by appending the Spot column so it conforms to

[expected.txt]
ColNameHeader=True
CharacterSet=1252
Format=Delimited(;)
Col1=Num0 Float
Col2=Num1 Float
Col3=Num2 Float
Col4=Num3 Float
Col5=Num4 Float
Col6=Spot Integer

in your schema.ini file. Similarly, transform a measure file like:

Num0    Num1    Num2    Num3    Num4
7,1     1,1     3,8     0,9     8,9
0,9     9,4     4,7     8,8     9,9

to measured.txt

Num0    Num1    Num2    Num3    Num4    Spot
7,1     1,1     3,8     0,9     8,9     1
0,9     9,4     4,7     8,8     9,9     2

Apply

  sSQL = Join( Array( _
         "SELECT E.Num0 - M.Num0 AS Dif0" _
      ,       ", E.Num1 - M.Num1 AS Dif1" _
      ,       ", E.Num2 - M.Num2 AS Dif2" _
      ,       ", E.Num3 - M.Num3 AS Dif3" _
      ,       ", E.Num4 - M.Num4 AS Dif4" _
      ,       ", E.Spot          AS Spot" _
      ,  "FROM [expected.txt] AS E" _
      ,  "INNER JOIN [measured.txt] AS M" _
      ,  "ON E.Spot = M.Spot" _
  ), " " )

Write the resultset to differences.txt

aFNames = Array( "Num0", ... "Spot" )
oFS.CreateTextFile( sFSpec ).Write _
Join( aFNames, sFSep ) & sRSep & oRS.GetString( adClipString, , sFSep, sRSep, "" )

and you get:

Num0    Num1    Num2    Num3    Num4    Spot
0,5     5       0       0       0       1
0       0       0       0       0       2
太阳公公是暖光 2024-11-19 01:37:34

您不需要编码,您可以使用文本编辑器中的替换功能使两个文件中的分隔符相同(空格或逗号),并使用 TortoiseSVN 的图形差异工具进行比较: http://tortoisesvn.net/

You don't need to code, you can make separators the same in both files (spaces or commas) using replace function in text editor and compare it using graphical diff tool from TortoiseSVN: http://tortoisesvn.net/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文