在 Python 中有效地将文本添加到非常大的文本文件中

发布于 2024-10-17 04:51:00 字数 650 浏览 3 评论 0原文

我必须在现有但非常大（2 - 10 GB 范围）的文本文件中添加一些任意文本。由于文件太大，我试图避免将整个文件读入内存。但我对于逐行迭代是否过于保守？与当前的方法相比，转向 readlines(sizehint) 方法是否会给我带来很大的性能优势？

最后的删除和移动不太理想，但据我所知，没有办法对线性数据进行这种操作。但我不太精通 Python——也许我可以利用 Python 的一些独特之处来做得更好？

import os
import shutil
def prependToFile(f, text):
    f_temp = generateTempFileName(f)
    inFile  = open(f, 'r')
    outFile = open(f_temp, 'w')    
    outFile.write('# START\n')
    outFile.write('%s\n' % str(text))
    outFile.write('# END\n\n')
    for line in inFile:
        outFile.write(line)
    inFile.close()
    outFile.close()
    os.remove(f)
    shutil.move(f_temp, f)

原文

I have to prepend some arbitrary text to an existing, but very large (2 - 10 GB range) text file. With the file being so large, I'm trying to avoid reading the entire file in to memory. But am I being too conservative with a line-by-line iteration? Would moving to a readlines(sizehint) approach give me much of a performance advantage over my current approach?

The delete-and-move at the end is less than ideal but, as far as I know, there's no way to do this sort of manipulation with linear data, in place. But I'm not so well versed in Python -- maybe there's something unique to Python I can exploit to do this better?

import os
import shutil
def prependToFile(f, text):
    f_temp = generateTempFileName(f)
    inFile  = open(f, 'r')
    outFile = open(f_temp, 'w')    
    outFile.write('# START\n')
    outFile.write('%s\n' % str(text))
    outFile.write('# END\n\n')
    for line in inFile:
        outFile.write(line)
    inFile.close()
    outFile.close()
    os.remove(f)
    shutil.move(f_temp, f)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泪意 2024-10-24 04:51:00

如果这是在 Windows NTFS 上，您可以插入到文件的中间。（或者有人告诉我，我不是 Windows 开发人员）。

如果这是在 POSIX（Linux 或 Unix）系统上，您应该像其他人所说的那样使用“cat”。 cat 非常高效，使用书中的每一个技巧来获得最佳性能（即，无效复制缓冲区等）。

但是，如果您必须在 python 中执行此操作，则可以通过使用shutil.copyfileobj() 来改进您提供的代码（它需要2个文件句柄）和tempfile.TemporaryFile（创建一个在关闭时自动删除的文件）：

import os
import shutil
import tempfile

def prependToFile(f, text):
    outFile = tempfile.NamedTemporaryFile(dir='.', delete=False)
    outFile.write('# START\n')
    outFile.write('%s\n' % str(text))
    outFile.write('# END\n\n')
    shutil.copyfileobj(file(f, 'r'), outFile)
    os.remove(f)
    shutil.move(outFile.name, f)
    outFile.close()

我认为不需要os.remove(f)，因为shutil.move()将删除f。但是，您应该对此进行测试。此外，“delete=False”可能不需要，但保留它可能是安全的。

If this is on Windows NTFS, you can insert into the middle of a file. (Or so I'm told, I'm not a Windows developer).

If this is on a POSIX (Linux or Unix) system, you should use "cat" as someone else said. cat is wickedly efficient, using every trick in the book to get optimal performance (ie. voids copying buffers, etc.)

However, if you must do it in python, the code you presented could be improved by using shutil.copyfileobj() (which takes 2 file handles) and tempfile.TemporaryFile (create a file that automatically gets deleted on close):

import os
import shutil
import tempfile

def prependToFile(f, text):
    outFile = tempfile.NamedTemporaryFile(dir='.', delete=False)
    outFile.write('# START\n')
    outFile.write('%s\n' % str(text))
    outFile.write('# END\n\n')
    shutil.copyfileobj(file(f, 'r'), outFile)
    os.remove(f)
    shutil.move(outFile.name, f)
    outFile.close()

I think the os.remove(f) isn't needed as shutil.move() will delete f. However, you should test that. Also, the "delete=False" may not be needed but may be safe to leave it.

回复收藏 0 原文