在Java中比较两个Excel文件的最简单方法?
我正在为一些生成 Excel 文件(二进制文件)的代码编写 JUnit 测试。 我有另一个 Excel 文件,其中包含我的预期输出。 将实际文件与预期文件进行比较的最简单方法是什么?
当然,我可以自己编写代码,但我想知道受信任的第三方库(例如 Spring 或 Apache Commons)中是否存在已经执行此操作的现有方法。
I'm writing a JUnit test for some code that produces an Excel file (which is binary). I have another Excel file that contains my expected output. What's the easiest way to compare the actual file to the expected file?
Sure I could write the code myself, but I was wondering if there's an existing method in a trusted third-party library (e.g. Spring or Apache Commons) that already does this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
您可以考虑使用我的项目 simple-excel 它提供了一堆 Hamcrest Matchers 来完成这项工作。
,当您执行以下操作时,
例如
您会看到,这样,您可以从自动化测试中运行它,并在开发时获得有意义的反馈。
您可以阅读这篇关于我的网站
You might consider using my project simple-excel which provides a bunch of Hamcrest Matchers to do the job.
When you do something like the following,
You'd see, for example,
That way, you can run it from your automatted tests and get meaningful feedback whilst you're developing.
You can read more about it at this article on my site
这就是我最终所做的(其中繁重的工作由 DBUnit 完成):
这比较了两个中的数据文件,不存在因任何可能不同的不相关元数据而导致误报的风险。 希望这对某人有帮助。
Here's what I ended up doing (with the heavy lifting being done by DBUnit):
This compares the data in the two files, with no risk of false negatives from any irrelevant metadata that might be different. Hope this helps someone.
使用一些校验和(如 MD5)或仅读取两个文件可以轻松完成简单的文件比较。
然而,正如 James Burgess 指出的那样,由于 Excel 文件包含大量元数据,这些文件可能永远不会逐字节相同。
因此,您的测试需要另一种比较。
我建议以某种方式从 Excel 文件生成“规范”表单,即读取生成的 Excel 文件并将其转换为更简单的格式(CSV 或类似格式),这只会保留您想要检查的信息。 然后您可以使用“规范形式”与您的预期结果进行比较(当然也是规范形式)。
Apache POI 可能对于读取该文件很有用。
顺便说一句:读取整个文件来检查其正确性通常不会被视为单元测试。 这是一个集成测试...
A simple file comparison can easily be done using some checksumming (like MD5) or just reading both files.
However, as Excel files contain loads of metadata, the files will probably never be identical byte-for-byte, as James Burgess pointed out.
So you'll need another kind of comparison for your test.
I'd recommend somehow generating a "canonical" form from the Excel file, i.e. reading the generated Excel file and converting it to a simpler format (CSV or something similar), which will only retain the information you want to check. Then you can use the "canonical form" to compare with your expected result (also in canonical form, of course).
Apache POI might be useful for reading the file.
BTW: Reading a whole file to check its correctnes would generally not be considere a Unit test. That's an integration test...
我需要做类似的事情,并且已经在我的项目中使用 Apache POI 库 来创建 Excel 文件。 所以我选择使用包含的 ExcelExtractor 界面将两个工作簿导出为文本字符串并断言这些字符串相等。 .xls 的 HSSF 都有实现以及 XSSF for .xlsx。
转储到字符串:
ExcelExtractor 有一些选项,用于确定字符串转储中应包含的所有内容。 我发现它具有包含工作表名称的有用默认值。 此外,它还包括单元格的文本内容。
I needed to do something similar and was already using the Apache POI library in my project to create Excel files. So I opted to use the included ExcelExtractor interface to export both workbooks as a string of text and asserted that the strings were equal. There are implementations for both HSSF for .xls as well as XSSF for .xlsx.
Dump to string:
ExcelExtractor has some options for what all should be included in the string dump. I found it to have useful defaults of including sheet names. In addition it includes the text contents of the cells.
我发现最简单的方法是使用 Tika。
我这样使用它:
The easiest way I find is to use Tika.
I use it like this:
您可以使用 javaxdelta 来检查两个文件是否相同。 它可以从这里获得:
http://javaxdelta.sourceforge.net/
You could use javaxdelta to check whether the two files are the same. It's available from here:
http://javaxdelta.sourceforge.net/
刚刚发现 commons-io 中有一些东西 FileUtils。 感谢其他答案。
Just found out there's something in commons-io's FileUtils. Thanks for the other answers.
请查看比较二进制文件的网站,http://www.velocityreviews.com/forums/t123770-re-java-code-for-确定-binary-file-equality.html
老虎
Please, take a look at the site to compare the binary files, http://www.velocityreviews.com/forums/t123770-re-java-code-for-determining-binary-file-equality.html
Tiger
您可以使用Beyond Compare 3,它可以从命令行启动,支持不同方式比较Excel文件,包括:
You may use Beyond Compare 3 which can be started from command-line and supports different ways to compare Excel files, including:
仅测试 Kotlin 中第一个工作表的内容(可以轻松转换为 java)。
并在代码中添加断言,
如您所见,
doc.toList().first()
将仅获取文档的第一张纸,如果您需要分别比较每张纸,请稍微更改代码。另外,不考虑“”空字符串单元格也是一个好主意,我不需要此功能(如果需要,也只需添加此部分)。
它还可以是有用的信息
和文件中的第二个文档以进行比较
To test only content of the first sheets in Kotlin (easily can be converted to java).
and in your code add assert
as you can see
doc.toList().first()
will take only the first sheet of document, if you need to compare each sheet respectively change code a little.Also it is quite good idea to not take into account "" empty strings cells, I didn't need this functionality (As well, simply add this part, if you need).
also it can be useful information
and second doc from file to compare with
经过长期研究后对我来说最好的方法:
使用 Apachi POI 作为外部库
对 xlsx 文件使用 XSSF。
如果你想比较 xls 文件必须使用 HSSF
祝你好运
Best way for me after a long research:
Using Apachi POI as external library
Using XSSF for xlsx files.
If you want to compare xls files have to use HSSF
Good luck
也许...比较每个文件的 MD5 摘要? 我确信有很多方法可以做到这一点。 您可以打开两个文件并比较每个字节。
编辑:James 说明了 XLS 格式在元数据中可能存在差异。 也许您应该使用与生成 xls 文件相同的界面来打开它们并比较单元格之间的值?
Maybe... compare MD5 digests of each file? I'm sure there are a lot of ways to do it. You could just open both files and compare each byte.
EDIT: James stated how the XLS format might have differences in the metadata. Perhaps you should use the same interface you used to generate the xls files to open them and compare the values from cell to cell?