处理巨大的管道分隔文件
参考我之前的帖子
我能够处理较小的文件并删除第一行......但是如果文件很大,则存在内存问题,因为我正在读取整个文件,然后再次将其写回。
任何人都可以建议一个更好的替代方案来解决这个问题。
感谢您的提前。
维韦克
With reference to my previous post
Remove first line from a delimited file
I was able to process smaller files and remove the first line .... but incase of huge files there is an issue of memory as I am reading the whole file and then writing it back again.
Can anybody suggest a better alternative to this issue.
Thanks for Advance.
Vivek
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您必须逐行读取文件并将其写入到位:
You have to read file line-by-line and write it on place:
为了避免重写整个文件以删除一行,您可以维护文件“开始”的索引。该索引是您认为开始的位置以及开始读取文件的位置。您可以定期(例如每晚一次)重写文件,以便此“开始”是文件实际开始的位置。
该“开始”位置可以存储在另一个时间或现有文件的开头。
这意味着您可以逐步“删除”文件的所有行,而无需重写它。
To avoid rewriting the entire file to remove one line you can maintain an index to the "start" of the file. This index is where you believe the start to be and where you would start reading the file from. Periodically e.g. once a night, you can rewrite the file so that this "start" is where the file actually starts.
This "start" location can be stored in another time or at the start of the existing file.
This means you can progressively "remove" all the lines of a file without re-writing it at all.