在Python中恢复大文件写入

发布于 2024-09-11 01:09:15 字数 1049 浏览 10 评论 0原文

我有一个大文件传输(比如 4GB 左右),而不是使用 Shutil,我只是以正常文件方式打开和写入它,这样我就可以在它移动时包含进度百分比。

然后我想到尝试恢复文件写入,如果由于某种原因在该过程中失败了。但我没有任何运气。我认为这将是抵消源文件读取和使用搜索的巧妙组合,但到目前为止我还没有任何运气。有什么想法吗?

此外,是否有某种动态方法来确定读取和写入文件时使用的块大小?我对该领域相当陌生,只是阅读以使用更大的尺寸来处理更大的文件(我目前使用的是 65536)。有没有一种聪明的方法来做到这一点,或者只是猜测......?谢谢你们。

以下是附加文件传输的代码片段:

                newsrc = open(src, 'rb')
                dest_size = os.stat(destFile).st_size
                print 'Dest file exists, resuming at block %s' % dest_size
                newsrc.seek(dest_size)
                newdest = open(destFile, 'a')
                cur_block_pos = dest_size
                # Start copying file
                while True:
                    cur_block = newsrc.read(131072)                    
                    cur_block_pos += 131072
                    if not cur_block:
                        break
                    else:
                       newdest.write(cur_block)

它确实附加并开始写入,但最后它会写入 dest_size 更多的数据,这对于其他人来说可能是显而易见的原因。有什么想法吗?

I have a big file transfer (say 4gb or so) and rather than using shutil, I'm just opening and writing it the normal file way so I can include a progress percentage as it moves along.

It then occurred to me to try to attempt to resume the file write, if for some reason it borked out during the process. I haven't had any luck though. I presumed it would be some clever combination of offsetting the read of the source file and using seek, but I haven't had any luck so far. Any ideas?

Additionally, is there some sort of dynamic way to figure what block size to use when reading and writing files? I'm fairly novice to that area, and just read to use a larger size for larger file (I'm using 65536 at the moment). Is there a smart way to do it, or does one simply guess..? Thanks guys.

Here is the code snippet of the appending file transfer:

                newsrc = open(src, 'rb')
                dest_size = os.stat(destFile).st_size
                print 'Dest file exists, resuming at block %s' % dest_size
                newsrc.seek(dest_size)
                newdest = open(destFile, 'a')
                cur_block_pos = dest_size
                # Start copying file
                while True:
                    cur_block = newsrc.read(131072)                    
                    cur_block_pos += 131072
                    if not cur_block:
                        break
                    else:
                       newdest.write(cur_block)

It does append and start writing, but it then writes dest_size more data at the end than it should for probably obvious reasons to the rest of you. Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

只怪假的太真实 2024-09-18 01:09:15

对于问题的第二部分,数据通常以 512 字节的块从硬盘驱动器读取和写入。因此,使用其倍数的块大小应该可以提供最有效的传输。除此之外,没有多大关系。请记住,无论您指定的块大小是什么,都是 I/O 操作在任何给定时间存储在内存中的数据量,因此不要选择太大的块,以免占用大量 RAM。我认为8K(8192)是一个常见的选择,但64K应该也可以。 (我认为当您选择最佳块大小时,传输的文件大小并不重要)

For the second part of your question, data is typically read from and written to a hard drive in blocks of 512 bytes. So using a block size that is a multiple of that should give the most efficient transfer. Other than that, it doesn't matter much. Just keep in mind that whatever block size you specify is the amount of data that the I/O operation stores in memory at any given time, so don't choose something so large that it uses up a lot of your RAM. I think 8K (8192) is a common choice, but 64K should be fine. (I don't think the size of the file being transferred matters much when you're choosing the best block size)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文