在 Ruby 中解析文件时跳过行的最快方法?
我尝试寻找这个,但找不到太多。这似乎是以前可能被问过的问题(很多次?),所以如果是这样的话,我深表歉意。
我想知道在 Ruby 中解析文件某些部分的最快方法是什么。例如,假设我知道特定函数所需的信息位于 1000 行文件的第 500 行和 600 行之间。 (显然这种问题是针对大文件的,我只是为了举例而使用那些较小的数字),因为我知道它不会出现在上半部分,有没有一种快速的方法可以忽略该信息?
目前我正在使用类似的东西:
while buffer = file_in.gets and file_in.lineno <600
next unless file_in.lineno > 500
if buffer.chomp!.include? some_string
do_func_whatever
end
end
它有效,但我只是忍不住认为它可以工作得更好。
我对 Ruby 很陌生,并且有兴趣学习用 Ruby 做事的新方法。
I tried searching for this, but couldn't find much. It seems like something that's probably been asked before (many times?), so I apologize if that's the case.
I was wondering what the fastest way to parse certain parts of a file in Ruby would be. For example, suppose I know the information I want for a particular function is between lines 500 and 600 of, say, a 1000 line file. (obviously this kind of question is geared toward much large files, I'm just using those smaller numbers for the sake of example), since I know it won't be in the first half, is there a quick way of disregarding that information?
Currently I'm using something along the lines of:
while buffer = file_in.gets and file_in.lineno <600
next unless file_in.lineno > 500
if buffer.chomp!.include? some_string
do_func_whatever
end
end
It works, but I just can't help but think it could work better.
I'm very new to Ruby and am interested in learning new ways of doing things in it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
通常,您无法避免从头开始读取文件直到您感兴趣的行,因为每行的长度都可以不同。不过,您可以避免的一件事是将整个文件加载到一个大数组中。只需逐行阅读,数数,然后丢弃它们,直到找到您想要的内容。很像你自己的例子。你可以让它变得更加Rubyish。
附言。铁皮人的评论让我做了一些实验。虽然我没有找到任何原因
drop
加载整个文件,但确实存在一个问题:drop
返回文件的其余部分一个数组。这是可以避免这种情况的方法:PS2:Doh,上面的代码,虽然没有创建一个巨大的数组,但会迭代整个文件,甚至是 600 以下的行。:( 这是第三个版本:
或者,如果您更喜欢 FP:
无论如何,这个独白的好处是你可以学习多种方法来迭代文件;)
Generally, you can't avoid reading file from the start until the line you are interested in, as each line can be of different length. The one thing you can avoid, though, is loading whole file into a big array. Just read line by line, counting, and discard them until you reach what you look for. Pretty much like your own example. You can just make it more Rubyish.
PS. the Tin Man's comment made me do some experimenting. While I didn't find any reason why would
drop
load whole file, there is indeed a problem:drop
returns the rest of the file in an array. Here's a way this could be avoided:PS2: Doh, above code, while not making a huge array, iterates through the whole file, even the lines below 600. :( Here's a third version:
or, if you prefer FP:
Anyway, the good point of this monologue is that you can learn multiple ways to iterate a file. ;)
我不知道是否有等效的方法可以对行执行此操作,但您可以在 IO 对象上使用
seek
或offset
参数来“跳过”字节。请参阅 IO#seek,或参阅 IO#open 有关偏移参数的信息。
I don't know if there is an equivalent way of doing this for lines, but you can use
seek
or theoffset
argument on an IO object to "skip" bytes.See IO#seek, or see IO#open for information on the offset argument.
听起来 rio 可能会有所帮助。它为您提供了一个
lines()
方法。Sounds like rio might be of help here. It provides you with a
lines()
method.您可以使用 IO#readlines,它返回一个包含所有行的数组
或
You can use IO#readlines, that returns an array with all the lines
or