在Python中恢复大文件写入
我有一个大文件传输(比如 4GB 左右),而不是使用 Shutil,我只是以正常文件方式打开和写入它,这样我就可以在它移动时包含进度百分比。
然后我想到尝试恢复文件写入,如果由于某种原因在该过程中失败了。但我没有任何运气。我认为这将是抵消源文件读取和使用搜索的巧妙组合,但到目前为止我还没有任何运气。有什么想法吗?
此外,是否有某种动态方法来确定读取和写入文件时使用的块大小?我对该领域相当陌生,只是阅读以使用更大的尺寸来处理更大的文件(我目前使用的是 65536)。有没有一种聪明的方法来做到这一点,或者只是猜测......?谢谢你们。
以下是附加文件传输的代码片段:
newsrc = open(src, 'rb')
dest_size = os.stat(destFile).st_size
print 'Dest file exists, resuming at block %s' % dest_size
newsrc.seek(dest_size)
newdest = open(destFile, 'a')
cur_block_pos = dest_size
# Start copying file
while True:
cur_block = newsrc.read(131072)
cur_block_pos += 131072
if not cur_block:
break
else:
newdest.write(cur_block)
它确实附加并开始写入,但最后它会写入 dest_size 更多的数据,这对于其他人来说可能是显而易见的原因。有什么想法吗?
I have a big file transfer (say 4gb or so) and rather than using shutil, I'm just opening and writing it the normal file way so I can include a progress percentage as it moves along.
It then occurred to me to try to attempt to resume the file write, if for some reason it borked out during the process. I haven't had any luck though. I presumed it would be some clever combination of offsetting the read of the source file and using seek, but I haven't had any luck so far. Any ideas?
Additionally, is there some sort of dynamic way to figure what block size to use when reading and writing files? I'm fairly novice to that area, and just read to use a larger size for larger file (I'm using 65536 at the moment). Is there a smart way to do it, or does one simply guess..? Thanks guys.
Here is the code snippet of the appending file transfer:
newsrc = open(src, 'rb')
dest_size = os.stat(destFile).st_size
print 'Dest file exists, resuming at block %s' % dest_size
newsrc.seek(dest_size)
newdest = open(destFile, 'a')
cur_block_pos = dest_size
# Start copying file
while True:
cur_block = newsrc.read(131072)
cur_block_pos += 131072
if not cur_block:
break
else:
newdest.write(cur_block)
It does append and start writing, but it then writes dest_size more data at the end than it should for probably obvious reasons to the rest of you. Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于问题的第二部分,数据通常以 512 字节的块从硬盘驱动器读取和写入。因此,使用其倍数的块大小应该可以提供最有效的传输。除此之外,没有多大关系。请记住,无论您指定的块大小是什么,都是 I/O 操作在任何给定时间存储在内存中的数据量,因此不要选择太大的块,以免占用大量 RAM。我认为8K(8192)是一个常见的选择,但64K应该也可以。 (我认为当您选择最佳块大小时,传输的文件大小并不重要)
For the second part of your question, data is typically read from and written to a hard drive in blocks of 512 bytes. So using a block size that is a multiple of that should give the most efficient transfer. Other than that, it doesn't matter much. Just keep in mind that whatever block size you specify is the amount of data that the I/O operation stores in memory at any given time, so don't choose something so large that it uses up a lot of your RAM. I think 8K (8192) is a common choice, but 64K should be fine. (I don't think the size of the file being transferred matters much when you're choosing the best block size)