如何比较目录以确定哪些文件已更改?

发布于 2024-10-17 12:07:39 字数 371 浏览 7 评论 0原文

我们需要一个脚本来比较两个文件目录,并且对于目录 1 和目录 2 之间已更改(添加、删除、修改)的每个文件,需要仅创建那些已修改文件的子集。

我的第一印象是创建一个 python 脚本来遍历每个目录,计算每个文件的哈希值,如果哈希值已更改,则将文件复制到新的文件子集。这是正确的做法吗?我是否忽略了任何可以做到这一点的工具?我从未使用过它,但也许可以使用 rsync 之类的东西?

谢谢

编辑:

重要的是我能够编译仅更改的那些文件的子集 - 因此,如果版本之间只有 3 个文件发生更改,我只需要将这三个文件复制到新目录...

We need a script that will compare two directories of files and for each file that has been altered between directory 1 and directory 2 (added, deleted, modified), need to create a subset of only those modified files.

My first impression is to create a python script to traverse each directory, compute a hash of each file, and if the hash has changed, copy the file over to the new subset of files. Is this a proper approach? Am I neglecting any tools out there which may do this already? I've never used it, but maybe use something like rsync could be used?

Thanks

Edit:

The important part is that I am able to compile a subset of only those files were changed-- so if a only 3 files have changed between versions, I only need those three files copied to a new directory...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

南街女流氓 2024-10-24 12:07:39

在我看来,你需要的东西就这么简单:

from os.path import getmtime
from os import sep,listdir

rep1 = 'I:\\dada'
rep2 = 'I:\\didi'

R1 = listdir(rep1)
R2 = listdir(rep2)


vanished = [ filename for filename in R1 if filename not in R2]
appeared = [ filename for filename in R2 if filename not in R1]
modified = [ filename for filename in ( f for f in R2 if f in R1)
             if getmtime(rep1+sep+filename)!=getmtime(rep2+sep+filename)]


print 'vanished==',vanished
print 'appeared==',appeared
print 'modified==',modified

It seems to me that you need something as simple as that:

from os.path import getmtime
from os import sep,listdir

rep1 = 'I:\\dada'
rep2 = 'I:\\didi'

R1 = listdir(rep1)
R2 = listdir(rep2)


vanished = [ filename for filename in R1 if filename not in R2]
appeared = [ filename for filename in R2 if filename not in R1]
modified = [ filename for filename in ( f for f in R2 if f in R1)
             if getmtime(rep1+sep+filename)!=getmtime(rep2+sep+filename)]


print 'vanished==',vanished
print 'appeared==',appeared
print 'modified==',modified
相思碎 2024-10-24 12:07:39

这是一种完全合理的方法,但您本质上是在重新发明 rsync。所以是的,使用 rsync。

编辑有一种方法可以创建“差异-仅”使用rsync的文件夹

That is one completely reasonable approach, but you are essentially reinventing rsync. So yes, use rsync.

edit: There's a way to create "difference-only" folders using rsync

微凉徒眸意 2024-10-24 12:07:39

我喜欢 diffmerge,它非常适合此目的。

I like diffmerge, it works great for this purpose.

心房敞 2024-10-24 12:07:39

我修改了 @eyquem 答案!

参数可以给出为

python file.py dir1 dir2

注意:根据修改时间排序!

#!/usr/bin/python
import os, sys,time
from os.path import getmtime
from os import sep,listdir

ORIG_DIR = sys.argv[1]#orig:-->/root/backup.FPSS/bin/httpd
MODIFIED_DIR = sys.argv[2]#modified-->/FPSS/httpd/bin/httpd

LIST_OF_FILES_IN_ORIG_DIR = listdir(ORIG_DIR)
LIST_OF_FILES_IN_MODIFIED_DIR = listdir(MODIFIED_DIR)


vanished = [ filename for filename in LIST_OF_FILES_IN_ORIG_DIR if filename not in LIST_OF_FILES_IN_MODIFIED_DIR]
appeared = [ filename for filename in LIST_OF_FILES_IN_MODIFIED_DIR if filename not in LIST_OF_FILES_IN_ORIG_DIR]
modified = [ filename for filename in ( f for f in LIST_OF_FILES_IN_MODIFIED_DIR if f in LIST_OF_FILES_IN_ORIG_DIR) if getmtime(ORIG_DIR+sep+filename)<getmtime(MODIFIED_DIR+sep+filename)]
same = [ filename for filename in ( f for f in LIST_OF_FILES_IN_MODIFIED_DIR if f in LIST_OF_FILES_IN_ORIG_DIR) if getmtime(ORIG_DIR+sep+filename)>=getmtime(MODIFIED_DIR+sep+filename)]

def print_list(arg):
    for f in arg:
        print '----->',f
    print 'Total :: ',len(arg)

print '###################################################################################################'
print 'Files which have Vanished from MOD: ',MODIFIED_DIR,' but still present ',ORIG_DIR,' ==>\n',print_list(vanished)
print '-----------------------------------------------------------------------------------------------------'
print 'Files which are Appearing in MOD: ',MODIFIED_DIR,' but not present ',ORIG_DIR,' ==>\n',print_list(appeared)
print '-----------------------------------------------------------------------------------------------------'
print 'Files in MOD: ',MODIFIED_DIR,' which are MODIFIED if compared to ORIG: ',ORIG_DIR,' ==>\n',print_list(modified)
print '-----------------------------------------------------------------------------------------------------'
print 'Files in MOD: ',MODIFIED_DIR,' which are NOT modified if compared to ORIG: ',ORIG_DIR,' ==>\n',print_list(same)
print '###################################################################################################'

I have modified @eyquem answer a bit!

Arguments can be given as

python file.py dir1 dir2

NOTE : sorts on basis of modification time !

#!/usr/bin/python
import os, sys,time
from os.path import getmtime
from os import sep,listdir

ORIG_DIR = sys.argv[1]#orig:-->/root/backup.FPSS/bin/httpd
MODIFIED_DIR = sys.argv[2]#modified-->/FPSS/httpd/bin/httpd

LIST_OF_FILES_IN_ORIG_DIR = listdir(ORIG_DIR)
LIST_OF_FILES_IN_MODIFIED_DIR = listdir(MODIFIED_DIR)


vanished = [ filename for filename in LIST_OF_FILES_IN_ORIG_DIR if filename not in LIST_OF_FILES_IN_MODIFIED_DIR]
appeared = [ filename for filename in LIST_OF_FILES_IN_MODIFIED_DIR if filename not in LIST_OF_FILES_IN_ORIG_DIR]
modified = [ filename for filename in ( f for f in LIST_OF_FILES_IN_MODIFIED_DIR if f in LIST_OF_FILES_IN_ORIG_DIR) if getmtime(ORIG_DIR+sep+filename)<getmtime(MODIFIED_DIR+sep+filename)]
same = [ filename for filename in ( f for f in LIST_OF_FILES_IN_MODIFIED_DIR if f in LIST_OF_FILES_IN_ORIG_DIR) if getmtime(ORIG_DIR+sep+filename)>=getmtime(MODIFIED_DIR+sep+filename)]

def print_list(arg):
    for f in arg:
        print '----->',f
    print 'Total :: ',len(arg)

print '###################################################################################################'
print 'Files which have Vanished from MOD: ',MODIFIED_DIR,' but still present ',ORIG_DIR,' ==>\n',print_list(vanished)
print '-----------------------------------------------------------------------------------------------------'
print 'Files which are Appearing in MOD: ',MODIFIED_DIR,' but not present ',ORIG_DIR,' ==>\n',print_list(appeared)
print '-----------------------------------------------------------------------------------------------------'
print 'Files in MOD: ',MODIFIED_DIR,' which are MODIFIED if compared to ORIG: ',ORIG_DIR,' ==>\n',print_list(modified)
print '-----------------------------------------------------------------------------------------------------'
print 'Files in MOD: ',MODIFIED_DIR,' which are NOT modified if compared to ORIG: ',ORIG_DIR,' ==>\n',print_list(same)
print '###################################################################################################'
挽手叙旧 2024-10-24 12:07:39

包括子文件夹并比较文件的哈希值(需要 >Python 3.11)

from os.path import isdir,normpath
from os import sep,walk
import hashlib

rep1=normpath(input('Folder 1: '))
rep2=normpath(input('Folder 2: '))

def hashcheck(fileloc1,fileloc2): # only works from python 3.11 on
    if isdir(fileloc1) or isdir(fileloc2):
        return False if fileloc1[fileloc1.rfind(sep):]==fileloc2[fileloc2.rfind(sep):] else True
    with open(fileloc1,'rb') as f1:
        f1hash=hashlib.file_digest(f1,"sha256").hexdigest()
    with open(fileloc2,'rb') as f2:
        f2hash=hashlib.file_digest(f2,"sha256").hexdigest()
    return (f1hash!=f2hash)

R1=[]
R2=[]
for wfolder in list(walk(rep1)):
    R1+=(wfolder[0].replace(rep1,'')+sep+item for item in wfolder[2])
for wfolder in list(walk(rep2)):
    R2+=(wfolder[0].replace(rep2,'')+sep+item for item in wfolder[2])

vanished = [ pathname for pathname in R1 if pathname not in R2]
appeared = [ pathname for pathname in R2 if pathname not in R1]
modified = [ pathname for pathname in ( f for f in R2 if f in R1)
            if hashcheck(rep1+sep+pathname,rep2+sep+pathname)]

print ('vanished==',vanished,'\n')
print ('appeared==',appeared,'\n')
print ('modified==',modified,'\n')
input()

Including Subfolders and comparing hashes of the files (>Python 3.11 required)

from os.path import isdir,normpath
from os import sep,walk
import hashlib

rep1=normpath(input('Folder 1: '))
rep2=normpath(input('Folder 2: '))

def hashcheck(fileloc1,fileloc2): # only works from python 3.11 on
    if isdir(fileloc1) or isdir(fileloc2):
        return False if fileloc1[fileloc1.rfind(sep):]==fileloc2[fileloc2.rfind(sep):] else True
    with open(fileloc1,'rb') as f1:
        f1hash=hashlib.file_digest(f1,"sha256").hexdigest()
    with open(fileloc2,'rb') as f2:
        f2hash=hashlib.file_digest(f2,"sha256").hexdigest()
    return (f1hash!=f2hash)

R1=[]
R2=[]
for wfolder in list(walk(rep1)):
    R1+=(wfolder[0].replace(rep1,'')+sep+item for item in wfolder[2])
for wfolder in list(walk(rep2)):
    R2+=(wfolder[0].replace(rep2,'')+sep+item for item in wfolder[2])

vanished = [ pathname for pathname in R1 if pathname not in R2]
appeared = [ pathname for pathname in R2 if pathname not in R1]
modified = [ pathname for pathname in ( f for f in R2 if f in R1)
            if hashcheck(rep1+sep+pathname,rep2+sep+pathname)]

print ('vanished==',vanished,'\n')
print ('appeared==',appeared,'\n')
print ('modified==',modified,'\n')
input()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文