获取 md5 校验和的完成百分比

发布于 2025-01-03 08:21:42 字数 377 浏览 0 评论 0原文

我目前得到的 md5 校验和如下:

>>> import hashlib
>>> f = open(file)
>>> m = hashlib.md5()
>>> m.update(f.read())
>>> checksum = m.hedxigest()

我需要返回一个大视频文件的校验和,这将需要几分钟才能生成。我将如何实现百分比计数器,以便它在运行时打印每个百分比的完成百分比。像这样的东西:

>>> checksum = m.hedxigest()
1% done...
2% done...
etc.

I am currently getting an md5 checksum as follows:

>>> import hashlib
>>> f = open(file)
>>> m = hashlib.md5()
>>> m.update(f.read())
>>> checksum = m.hedxigest()

I need to return the checksum of a large video file, that will take several minutes to generate. How would I implement a percentage counter, such that it prints the percentage complete for each percentage while it is running. Something like:

>>> checksum = m.hedxigest()
1% done...
2% done...
etc.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

爺獨霸怡葒院 2025-01-10 08:21:42

您可以重复调用 update() 方法并将文件分块提供给它。因此,您可以自己展示进度。

import hashlib
import os

def digest_with_progress(filename, chunk_size):
    read_size = 0
    last_percent_done = 0
    digest = hashlib.md5()
    total_size = os.path.getsize(filename)

    data = True
    f = open(filename)
    while data:
        # Read and update digest.
        data = f.read(chunk_size)
        read_size += len(data)
        digest.update(data)

        # Calculate progress.
        percent_done = 100 * read_size / total_size
        if percent_done > last_percent_done:
            print '%d%% done' % percent_done
            last_percent_done = percent_done
    f.close()
    return digest.hexdigest()

当我尝试 printdigest_with_progress('/bin/bash', 1024) 时,我得到的是:

1% done
2% done
3% done
4% done
5% done
6% done
7% done
8% done
9% done
10% done
11% done
12% done
13% done
14% done
15% done
16% done
17% done
18% done
19% done
20% done
21% done
22% done
23% done
24% done
25% done
26% done
27% done
28% done
29% done
30% done
31% done
32% done
33% done
34% done
35% done
36% done
37% done
38% done
39% done
40% done
41% done
42% done
43% done
44% done
45% done
46% done
47% done
48% done
49% done
50% done
51% done
52% done
53% done
54% done
55% done
56% done
57% done
58% done
59% done
60% done
61% done
62% done
63% done
64% done
65% done
66% done
67% done
68% done
69% done
70% done
71% done
72% done
73% done
74% done
75% done
76% done
77% done
78% done
79% done
80% done
81% done
82% done
83% done
84% done
85% done
86% done
87% done
88% done
89% done
90% done
91% done
92% done
93% done
94% done
95% done
96% done
97% done
98% done
99% done
100% done
b114ecaab65bc5b02f5a129bd29d1864

以下是该文件的实际详细信息。

$ ls -l /bin/bash; md5sum /bin/bash
-rwxr-xr-x 1 root root 971384 Nov 30 16:31 /bin/bash
b114ecaab65bc5b02f5a129bd29d1864  /bin/bash

请注意,如果将 chunk_size 设置得太大,您将无法获得预期的输出。例如,如果我们读取 /bin/bash 的 100 KB 块而不是 1 KB 块,这就是您所看到的。

10% done
21% done
31% done
42% done
52% done
63% done
73% done
84% done
94% done
100% done
b114ecaab65bc5b02f5a129bd29d1864

这种方法的局限性在于,我们只有在将一个块读入摘要后才计算进度。因此,如果块大小太大,则每次读取块并更新摘要时,进度百分比差异将超过 1%。更大的块大小将使工作完成得更快一些。因此,您可能希望放宽每个百分比的打印完成百分比的条件,以提高效率。

You can call the update() method repeatedly and feed the file in chunks to it. Thus, you can show the progress yourself.

import hashlib
import os

def digest_with_progress(filename, chunk_size):
    read_size = 0
    last_percent_done = 0
    digest = hashlib.md5()
    total_size = os.path.getsize(filename)

    data = True
    f = open(filename)
    while data:
        # Read and update digest.
        data = f.read(chunk_size)
        read_size += len(data)
        digest.update(data)

        # Calculate progress.
        percent_done = 100 * read_size / total_size
        if percent_done > last_percent_done:
            print '%d%% done' % percent_done
            last_percent_done = percent_done
    f.close()
    return digest.hexdigest()

When I try print digest_with_progress('/bin/bash', 1024) this is what I get:

1% done
2% done
3% done
4% done
5% done
6% done
7% done
8% done
9% done
10% done
11% done
12% done
13% done
14% done
15% done
16% done
17% done
18% done
19% done
20% done
21% done
22% done
23% done
24% done
25% done
26% done
27% done
28% done
29% done
30% done
31% done
32% done
33% done
34% done
35% done
36% done
37% done
38% done
39% done
40% done
41% done
42% done
43% done
44% done
45% done
46% done
47% done
48% done
49% done
50% done
51% done
52% done
53% done
54% done
55% done
56% done
57% done
58% done
59% done
60% done
61% done
62% done
63% done
64% done
65% done
66% done
67% done
68% done
69% done
70% done
71% done
72% done
73% done
74% done
75% done
76% done
77% done
78% done
79% done
80% done
81% done
82% done
83% done
84% done
85% done
86% done
87% done
88% done
89% done
90% done
91% done
92% done
93% done
94% done
95% done
96% done
97% done
98% done
99% done
100% done
b114ecaab65bc5b02f5a129bd29d1864

Here are the actual details of this file.

$ ls -l /bin/bash; md5sum /bin/bash
-rwxr-xr-x 1 root root 971384 Nov 30 16:31 /bin/bash
b114ecaab65bc5b02f5a129bd29d1864  /bin/bash

Note that, you would not get the expected output if you make chunk_size too large. For example if we read in 100 KB chunks instead of 1 KB chunks for /bin/bash, this is what you see.

10% done
21% done
31% done
42% done
52% done
63% done
73% done
84% done
94% done
100% done
b114ecaab65bc5b02f5a129bd29d1864

The limitation of this approach is that we calculate the progress only after we have read a chunk into the digest. So, if the chunk size is too large, the percentage-difference in progress would be more than 1% every time you read a chunk and update the digest. A bigger chunk size would get the job done a bit quicker. So, you might want to relax the condition of printing percentage complete for each percentage in favour of efficiency.

似最初 2025-01-10 08:21:42

您应该使用 f.read(N_BYTES) 分块读取文件,跟踪您在文件中的位置,并将这些块传递给 m.update。这是昂贵的操作,而不是 md5.hexdigest

You should read the file in chunks with f.read(N_BYTES), keep track of how far in the file you are, and pass the chunks to m.update. That's the expensive operation, not md5.hexdigest.

臻嫒无言 2025-01-10 08:21:42

好吧,不是 hedxigest() 调用需要一段时间,而是文件的读取需要一段时间。

考虑到这一点,将 m.update(f.read()) 替换为循环,在该循环中逐块读取文件、更新校验和并定期打印进度报告。

Well, it's not the hedxigest() call that'll take a while, it's the reading of the file that will.

With this in mind, replace m.update(f.read()) with a loop where you read the file block by block, update the checksum, and periodically print out a progress report.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文