如何以编程方式比较两个存档文件的内容?
我正在做一些测试,以确保我使用脚本文件创建的全合一 zip 文件将产生与我必须手动单击并通过 Web 界面创建的几个 zip 文件的内容相同的输出。 因此 zip 将具有不同的文件夹结构。
当然,我可以手动将它们提取出来,并使用我强大的眼球技术来扫描它们,或者更懒的我可以编写一个脚本来做到这一点,但在我投入更多时间并被我的老板指控为公司时间抢劫之前,我问是否有更好的方法来做到这一点?
顺便说一句,我正在使用 perl LAMP 堆栈。 谢谢。
I'm doing some testing to ensure that the all in one zip file that i created using a script file will produce the same output as the content of a few zip files that i must manually click and create via web interface. Therefore the zip will have different folder structure.
Of course i can manually extracted them out and using my powerful eyeball technique to scan them or even lazier i can write a script to do that, but before i invest more time and get accused by my boss for company time robbery, i'm asking if there's a better way to do this?
I'm using perl LAMP stack by the way.
thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以使用perl的Archive::ZIP 或 Python 的 zipfile 提取文件中文件的文件名、大小和 CRC 校验和档案。 创建一个文件,其中包含按文件名排序的结果(忽略路径)。
对于较小的 ZIP,请合并脚本的结果 (
cat list1 list2 list3 | sort
)。现在,您可以使用
diff
来比较结果。You can use perl's Archive::ZIP or Python's zipfile to extract the filenames, sizes and CRC checksums of the files in the archives. Create a file which contains the results sorted by file name (ignore the path).
For your smaller ZIPs, merge the results of the script (
cat list1 list2 list3 | sort
).Now, you can use
diff
to compare the results.我可以全心全意地推荐超越比较。 除非你的工资真的很低,否则这对你(老板)来说是最大的收获。
[编辑] 我似乎扫描了不同文件夹结构,对此感到抱歉。Beyond Compare可以比较具有相同文件夹结构的文件夹中的所有文件。 它不具备(我相信)在不同文件夹中的文件中搜索匹配项的智能。
问候,
利文
I can wholeheartly recommend Beyond Compare. Unless you're really getting underpaid, it's the biggest bang for your (bosses) buck.
[Edit] I seem to have scanned over the different folder structure, sorry about that.Beyond Compare can compare all files in folders with the same folderstructure. It does not have (I believe) the intelligence to go searching for matches in files in different folders.
Regards,
Lieven
为您的文件创建一个 crc 校验和。
如果原始文件和解压缩文件的校验和相同,则可以确定这些文件是相同的。 甚至适用于非文本数据。
可以使用外部程序(例如“SFV Checker”)或以编程方式(例如,.net/java 包含执行此操作的库)轻松创建校验和。
Create a crc checksum for your files.
If your checksum is the same for the original files and the unzipped files, you can be sure the files are the same. And even works for non text data.
A checksum be easily be created with an external program such as "SFV Checker" or programmatically (.net/java for example include libraries to do this).
从 Carra 的答案中得到提示...如果 A.zip 是您的单个大存档,而 B.zip 是通过网络生成的存档,则使用以下算法
从 A.zip 中递归提取所有文件(wrt 文件夹)计算提取内容的文件夹中存在的文件的校验和(使用
cksum
、md5sum
等),并在排序后保存此信息(通过管道传输)排序
)到文件(例如A.txt)对 B.zip 执行相同操作并生成 B.txt
将 A.txt 与 B.txt 进行比较,它们应该完全相同。
或者
使用
unzip -l
获取两个 (zip) 存档的文件/目录列表,然后展平用户生成的 zip 文件的层次结构,并使用某些东西与脚本生成的 zip 文件的内容进行比较就像diff
。 通过扁平化层次结构,我的意思是您可能需要对一个或两个列表进行某种预处理,然后才能与diff
进行有意义的比较。Taking a cue from Carra's answer...if A.zip is your single big archive and B.zip is the archive generated through the web then use the following algorithm
Extract all files from A.zip and recursively (w.r.t folders) compute the checksum of the files present in the folder (using
cksum
,md5sum
etc) where the contents were extracted and save this information after sorting it (pipe it throughsort
) to a file (say A.txt)Do the same for B.zip and generate B.txt
Compare A.txt with B.txt they should be exactly the same.
OR
Use
unzip -l
to get file/directory lists for both the (zip) archives and then flatten the hierarchy of the user generated zip file and compare with the contents of your script generated zip file using some thing likediff
. By flattening of hierarchy I mean you may need to do some kind of pre-precessing on one or both lists before you can do a meaningful comparison withdiff
.