在Python中比较2个文件夹中同名的文件并检查它们的大小以删除较大的文件

发布于 2025-01-11 05:44:33 字数 1788 浏览 0 评论 0原文

你好,我正在清理我的计算机,所以我发现自己向 Handbrake 提供了大量文件以进行压缩。压缩后,有些文件的大小比原始文件大。我想清理它,所以我尝试制作一个小的 python 脚本。

基本上我有两个文件夹,其中的文件具有相同的名称但大小不同,我想比较这些文件以删除较大的文件,所以如果我合并文件夹,我将只保存大小较小的文件。

我制作了一个我拥有的文件夹的示例

- test/Original
 file1.mpg 40Mb
 file2.mpg 2Mb
 file3.mpg 400Mb
 file4.mpg 45Mb

- test/Compressed
 file1.mpg 20Mb
 file2.mpg 2Mb
 file3.mpg 200Mb
 file4.mpg 105Mb

在脚本的末尾我想要这个(或合并这些文件夹的第三个文件夹)

- test/Original
 file4.mpg 45Mb

- test/Compressed
 file1.mpg 20Mb
 file2.mpg 2Mb
 file3.mpg 200Mb
 file4.mpg 105Mb

我编写了这段代码并且它似乎可以工作,但我想知道是否有更好的方法是,我听说过一个函数 filecompare,但我不明白是否可以从中获取文件大小。

另外我不明白为什么如果我删除注释行的注释,我会收到缩进错误。

import os

dirA = 'test/a'
dirB = 'test/b'
merged = []

with os.scandir(dirA) as it:
    for entry in it:
        if entry.is_file():
            merged.append(entry)

with os.scandir(dirB) as it:
    for entry in it:
        if entry.is_file():
            merged.append(entry)


for i in range(len(merged)):
#   print('-------------iterating over %s' % (merged[i].name,merged[i].stat().st_size/1024**2))
    for j in range(i + 1, len(merged)):
        if str(merged[i].name) == str(merged[j].name):
            print('----DUPLICATE %s %.2f Mb = %s %.2f Mb' % (merged[i].name, merged[i].stat().st_size/1024**2, merged[j].name, merged[j].stat().st_size/1024**2))
            if merged[i].stat().st_size >= merged[j].stat().st_size:
                print('removing %s %.2f Mb' % (merged[i].name, merged[i].stat().st_size/1024**2))
                os.remove(merged[i])
            elif merged[i].stat().st_size < merged[j].stat().st_size:
                print('removing %s %.2f Mb' % (merged[j].name, merged[j].stat().st_size/1024**2))
                os.remove(merged[j])

Hello i'm cleaning up my computer, so i found myself feeding a huge list of files to Handbrake for compressing them. After the compression, some files have a size which is bigger than the original. I want to clean up that, so i tried to make a small python script.

Basically i have 2 folders with files having same name but different size, i want to compare the files to delete the bigger one, so if i merge the folders i'll have saved only the smaller files in size.

I make an example of the folders i have

- test/Original
 file1.mpg 40Mb
 file2.mpg 2Mb
 file3.mpg 400Mb
 file4.mpg 45Mb

- test/Compressed
 file1.mpg 20Mb
 file2.mpg 2Mb
 file3.mpg 200Mb
 file4.mpg 105Mb

At the end of the script i'd like to have this (or a third folder with those merged)

- test/Original
 file4.mpg 45Mb

- test/Compressed
 file1.mpg 20Mb
 file2.mpg 2Mb
 file3.mpg 200Mb
 file4.mpg 105Mb

i wrote this code and it seems to work, but i'd like to know if there's a better way of doing this, i heard of a function filecompare but i don't understand if i can get the filesize from it.

plus i dont understand why if i remove the comment to the line commented, i get an indent error.

import os

dirA = 'test/a'
dirB = 'test/b'
merged = []

with os.scandir(dirA) as it:
    for entry in it:
        if entry.is_file():
            merged.append(entry)

with os.scandir(dirB) as it:
    for entry in it:
        if entry.is_file():
            merged.append(entry)


for i in range(len(merged)):
#   print('-------------iterating over %s' % (merged[i].name,merged[i].stat().st_size/1024**2))
    for j in range(i + 1, len(merged)):
        if str(merged[i].name) == str(merged[j].name):
            print('----DUPLICATE %s %.2f Mb = %s %.2f Mb' % (merged[i].name, merged[i].stat().st_size/1024**2, merged[j].name, merged[j].stat().st_size/1024**2))
            if merged[i].stat().st_size >= merged[j].stat().st_size:
                print('removing %s %.2f Mb' % (merged[i].name, merged[i].stat().st_size/1024**2))
                os.remove(merged[i])
            elif merged[i].stat().st_size < merged[j].stat().st_size:
                print('removing %s %.2f Mb' % (merged[j].name, merged[j].stat().st_size/1024**2))
                os.remove(merged[j])

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

公布 2025-01-18 05:44:33

根据大小删除文件

这是一个简单的过程,可以通过一个函数来实现。

def  compare_folders(path1, path2):
    ignore = [".", "..", ".DS_Store"] # ignore these pointers/ files
    for  file  in  os.listdir(path1):
        if  file  in  os.listdir(path2):
            if  file  not  in  ignore:
                delete_larger_file(path1 + "/" + file, path2 + "/" + file)

  
def  merge_folders(path1, path2):
    for  file  in  os.listdir(path1):
        if  file  not  in  os.listdir(path2):
            os.rename(path1 + "/" + file, path2 + "/" + file)

def  delete_larger_file(path1, path2):
    if  os.path.getsize(path1) > os.path.getsize(path2):
        os.remove(path1)
    else:
        os.remove(path2)

这是怎么回事?

  • 第一个函数 compare_folders() 会将要比较的文件夹的路径作为输入。然后,它将迭代每个文件夹的内容并调用另一个函数 delete_larger_file(),该函数会比较 2 个文件的大小并删除较大的一个。
  • 随后需要调用 merge_folders() 来合并文件夹。换句话说,它将比较两个文件夹的内容,并将不在一个文件夹中的文件移动到另一个文件夹中。最后,一个文件夹应该是空的,另一个文件夹应该包含所有最小的文件。
  • 请注意:这无法撤消,所以可以先测试一下吗?此外,如果有子文件夹,这将不起作用并且需要递归。

首先调用 compare_folders() 然后调用 merge_folders

Deleting files based on size

This is a simple procedure and can be implemented in one funciton.

def  compare_folders(path1, path2):
    ignore = [".", "..", ".DS_Store"] # ignore these pointers/ files
    for  file  in  os.listdir(path1):
        if  file  in  os.listdir(path2):
            if  file  not  in  ignore:
                delete_larger_file(path1 + "/" + file, path2 + "/" + file)

  
def  merge_folders(path1, path2):
    for  file  in  os.listdir(path1):
        if  file  not  in  os.listdir(path2):
            os.rename(path1 + "/" + file, path2 + "/" + file)

def  delete_larger_file(path1, path2):
    if  os.path.getsize(path1) > os.path.getsize(path2):
        os.remove(path1)
    else:
        os.remove(path2)

What's going on here?

  • The first function compare_folders() will take the paths to the folders being compared as inputs. It will then iterate through the contents of each folder and call the other function delete_larger_file() which compares the sizes of 2 files and deletes the larger one.
  • A subsequent call to merge_folders() is necessary to merge the folders in place. In other words, it will compare the contents of both folders and move the files that are not in one to the other. In the end, one folder should be empty and the other one should have all the smallest files.
  • Be warned: this cannot be undone so maybe test it first? Also if there are subfolders this will not work and will require recursion.

First call compare_folders() then call merge_folders

与他有关 2025-01-18 05:44:33

我发布了完整的代码,以防有人需要它,感谢@drow339!!!!

import os

path1 = 'test1/a'
path2 = 'test1/b'

print('Comparing the folders: %s and %s' % (path1, path2))


def compare_folders(path1, path2):
    ignore = [".", "..", ".DS_Store"]  # ignore these pointers/ files
    for file in os.listdir(path1):
        print('Checking duplicates for %s' % file)
        if file in os.listdir(path2):
            print('ok')
            if file not in ignore:
                print('Duplicate found: %s <---------' % file)
                delete_larger_file(path1 + "/" + file, path2 + "/" + file)


def delete_larger_file(path1, path2):
    if os.path.getsize(path1) >= os.path.getsize(path2):
        print('Duplicates: %s - %s - deleting the first one' % (path1, path2))
        os.remove(path1)
    else:
        print('Duplicates: %s - %s - deleting the second one' % (path1, path2))
        os.remove(path2)


def merge_folders(path1, path2):
    for file in os.listdir(path1):
        if file not in os.listdir(path2):
            os.rename(path1 + "/" + file, path2 + "/" + file)


compare_folders(path1, path2)
merge_folders(path1, path2)

i post the complete code, in case anyone would need it, thanks to @drow339 for it!!!!

import os

path1 = 'test1/a'
path2 = 'test1/b'

print('Comparing the folders: %s and %s' % (path1, path2))


def compare_folders(path1, path2):
    ignore = [".", "..", ".DS_Store"]  # ignore these pointers/ files
    for file in os.listdir(path1):
        print('Checking duplicates for %s' % file)
        if file in os.listdir(path2):
            print('ok')
            if file not in ignore:
                print('Duplicate found: %s <---------' % file)
                delete_larger_file(path1 + "/" + file, path2 + "/" + file)


def delete_larger_file(path1, path2):
    if os.path.getsize(path1) >= os.path.getsize(path2):
        print('Duplicates: %s - %s - deleting the first one' % (path1, path2))
        os.remove(path1)
    else:
        print('Duplicates: %s - %s - deleting the second one' % (path1, path2))
        os.remove(path2)


def merge_folders(path1, path2):
    for file in os.listdir(path1):
        if file not in os.listdir(path2):
            os.rename(path1 + "/" + file, path2 + "/" + file)


compare_folders(path1, path2)
merge_folders(path1, path2)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文