在 Ruby 中使用固定大小缓冲区读取文本文件的正确方法是什么?
我正在尝试解决 Ruby 缓慢的 File.each_line 实现问题。
def parse
@file = File.open(@file)
@filesize = @file.stat.size
@buffer_size = 10485
@offset = @buffer_size
@counter = 0
@line_counter = 0
@file.seek(0, File::SEEK_END)
while @file.tell > 0
@file.seek(-@offset, File::SEEK_END)
buffer = @file.read(@buffer_size)
buffer_line = buffer.split("\n")[1]
@counter += 1
puts @counter
@offset += @buffer_size
return if @offset > @filesize
end
end
我不确定如何实现一个缓冲区来存储每行的剩余部分。
I am trying to go around the slow File.each_line implementation of Ruby.
def parse
@file = File.open(@file)
@filesize = @file.stat.size
@buffer_size = 10485
@offset = @buffer_size
@counter = 0
@line_counter = 0
@file.seek(0, File::SEEK_END)
while @file.tell > 0
@file.seek(-@offset, File::SEEK_END)
buffer = @file.read(@buffer_size)
buffer_line = buffer.split("\n")[1]
@counter += 1
puts @counter
@offset += @buffer_size
return if @offset > @filesize
end
end
I am not sure how could I implement a buffer where I could store the remaining part of each line.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
ruby-talk 邮件列表上讨论了类似的问题。该线程称为
快速处理方式大文件逐行。
A similar problem has been discussed on the ruby-talk mailing list. The thread is called
Fast way to process large files line by line.