获取 md5 校验和的完成百分比
我目前得到的 md5 校验和如下:
>>> import hashlib
>>> f = open(file)
>>> m = hashlib.md5()
>>> m.update(f.read())
>>> checksum = m.hedxigest()
我需要返回一个大视频文件的校验和,这将需要几分钟才能生成。我将如何实现百分比计数器,以便它在运行时打印每个百分比的完成百分比。像这样的东西:
>>> checksum = m.hedxigest()
1% done...
2% done...
etc.
I am currently getting an md5 checksum as follows:
>>> import hashlib
>>> f = open(file)
>>> m = hashlib.md5()
>>> m.update(f.read())
>>> checksum = m.hedxigest()
I need to return the checksum of a large video file, that will take several minutes to generate. How would I implement a percentage counter, such that it prints the percentage complete for each percentage while it is running. Something like:
>>> checksum = m.hedxigest()
1% done...
2% done...
etc.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以重复调用
update()
方法并将文件分块提供给它。因此,您可以自己展示进度。当我尝试
printdigest_with_progress('/bin/bash', 1024)
时,我得到的是:以下是该文件的实际详细信息。
请注意,如果将
chunk_size
设置得太大,您将无法获得预期的输出。例如,如果我们读取/bin/bash
的 100 KB 块而不是 1 KB 块,这就是您所看到的。这种方法的局限性在于,我们只有在将一个块读入摘要后才计算进度。因此,如果块大小太大,则每次读取块并更新摘要时,进度百分比差异将超过 1%。更大的块大小将使工作完成得更快一些。因此,您可能希望放宽每个百分比的打印完成百分比的条件,以提高效率。
You can call the
update()
method repeatedly and feed the file in chunks to it. Thus, you can show the progress yourself.When I try
print digest_with_progress('/bin/bash', 1024)
this is what I get:Here are the actual details of this file.
Note that, you would not get the expected output if you make
chunk_size
too large. For example if we read in 100 KB chunks instead of 1 KB chunks for/bin/bash
, this is what you see.The limitation of this approach is that we calculate the progress only after we have read a chunk into the digest. So, if the chunk size is too large, the percentage-difference in progress would be more than 1% every time you read a chunk and update the digest. A bigger chunk size would get the job done a bit quicker. So, you might want to relax the condition of printing percentage complete for each percentage in favour of efficiency.
您应该使用
f.read(N_BYTES)
分块读取文件,跟踪您在文件中的位置,并将这些块传递给m.update
。这是昂贵的操作,而不是md5.hexdigest
。You should read the file in chunks with
f.read(N_BYTES)
, keep track of how far in the file you are, and pass the chunks tom.update
. That's the expensive operation, notmd5.hexdigest
.好吧,不是
hedxigest()
调用需要一段时间,而是文件的读取需要一段时间。考虑到这一点,将
m.update(f.read())
替换为循环,在该循环中逐块读取文件、更新校验和并定期打印进度报告。Well, it's not the
hedxigest()
call that'll take a while, it's the reading of the file that will.With this in mind, replace
m.update(f.read())
with a loop where you read the file block by block, update the checksum, and periodically print out a progress report.