为什么两个 md5sum 文件的比较无法正常工作?
我有 2 个列表,其中包含带有 md5sum 检查的文件,并且这些列表对于相同文件具有不同的路径。
第一个文件中带有校验和的内容示例(server.list):
2c03ff18a643a1437ec0cf051b8b7b9d /tmp/fastq1_L001_R1_001.fastq.gz
c430f587aba1aa9f4fdf69aeb4526621 /tmp/fastq1_L001_R2_001.fastq.gz/
6e6bcd84f264233cf7c428c0cfdc0c03 tmp/fastq1_L002_R1_001.fastq.gz
两个文件中带有校验和的内容示例(downloaded.list):
2c03ff18a643a1437ec0cf051b8b7b9d /home/projects/fastq1_L001_R1_001.fastq.gz
c430f587aba1aa9f4fdf69aeb4526621 /home/projects/fastq1_L001_R2_001.fastq.gz
6e6bcd84f264233cf7c428c0cfdc0c03 /home/projects/fastq1_L002_R1_001.fastq.gz
当我运行以下行时,我收到以下行:
awk -F"/" 'FNR==NR{filearray[$1]=$NF; next }!($1 in filearray){printf "%s has a different md5sum\n",$NF}' downloaded.list server.list
fastq1_L001_R1_001.fastq.gz has a different md5sum
fastq1_L001_R2_001.fastq.gz has a different md5sum
fastq1_L002_R2_001.fastq.gz has a different md5sum
为什么我收到此消息两个文件中的第一列是否相同?有人可以告诉我这个问题吗?
编辑:
如果我删除路径并只保留文件名,它就可以正常工作。
编辑2:
正如所指出的,文件路径形式还有另一种可能性,它不以 /
开头。在这种情况下,我无法使用 /
作为字段分隔符。
I have 2 lists with files with their md5sum checks and the lists have different paths for the same files.
Example of content in first file with check sums (server.list):
2c03ff18a643a1437ec0cf051b8b7b9d /tmp/fastq1_L001_R1_001.fastq.gz
c430f587aba1aa9f4fdf69aeb4526621 /tmp/fastq1_L001_R2_001.fastq.gz/
6e6bcd84f264233cf7c428c0cfdc0c03 tmp/fastq1_L002_R1_001.fastq.gz
Example of content in two file with check sums (downloaded.list):
2c03ff18a643a1437ec0cf051b8b7b9d /home/projects/fastq1_L001_R1_001.fastq.gz
c430f587aba1aa9f4fdf69aeb4526621 /home/projects/fastq1_L001_R2_001.fastq.gz
6e6bcd84f264233cf7c428c0cfdc0c03 /home/projects/fastq1_L002_R1_001.fastq.gz
When I run the following line, I got the following lines:
awk -F"/" 'FNR==NR{filearray[$1]=$NF; next }!($1 in filearray){printf "%s has a different md5sum\n",$NF}' downloaded.list server.list
fastq1_L001_R1_001.fastq.gz has a different md5sum
fastq1_L001_R2_001.fastq.gz has a different md5sum
fastq1_L002_R2_001.fastq.gz has a different md5sum
Why I am getting this message since the first column is the same in both files? Can someone enlighten me on this issue?
Edit:
If I remove the path and leave only the file name, it works just fine.
Edit 2:
As pointed out, there is another possibility of file path form, which does not start with /
. In this case, I cannot use /
as the field separator.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
假设:
示例数据:
解决空白问题以及验证的一个
awk
想法文件名匹配:这会生成:
Assumptions:
Sample data:
One
awk
idea to address white space issues as well as verifying filename matches:This generates:
用作数组键的
$1
上的空格导致了问题。删除它:The whitespace on
$1
used as an array key is causing problems. Removing it: