在 Ruby 中解析文件时跳过行的最快方法?

发布于 2024-10-18 05:16:22 字数 507 浏览 16 评论 0原文

我尝试寻找这个,但找不到太多。这似乎是以前可能被问过的问题(很多次?),所以如果是这样的话,我深表歉意。

我想知道在 Ruby 中解析文件某些部分的最快方法是什么。例如,假设我知道特定函数所需的信息位于 1000 行文件的第 500 行和 600 行之间。 (显然这种问题是针对大文件的,我只是为了举例而使用那些较小的数字),因为我知道它不会出现在上半部分,有没有一种快速的方法可以忽略该信息?

目前我正在使用类似的东西:

while  buffer = file_in.gets and file_in.lineno <600
  next unless file_in.lineno > 500
  if buffer.chomp!.include? some_string
    do_func_whatever
  end
end

它有效,但我只是忍不住认为它可以工作得更好。

我对 Ruby 很陌生,并且有兴趣学习用 Ruby 做事的新方法。

I tried searching for this, but couldn't find much. It seems like something that's probably been asked before (many times?), so I apologize if that's the case.

I was wondering what the fastest way to parse certain parts of a file in Ruby would be. For example, suppose I know the information I want for a particular function is between lines 500 and 600 of, say, a 1000 line file. (obviously this kind of question is geared toward much large files, I'm just using those smaller numbers for the sake of example), since I know it won't be in the first half, is there a quick way of disregarding that information?

Currently I'm using something along the lines of:

while  buffer = file_in.gets and file_in.lineno <600
  next unless file_in.lineno > 500
  if buffer.chomp!.include? some_string
    do_func_whatever
  end
end

It works, but I just can't help but think it could work better.

I'm very new to Ruby and am interested in learning new ways of doing things in it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

过度放纵 2024-10-25 05:16:22
file.lines.drop(500).take(100) # will get you lines 501-600

通常,您无法避免从头开始读取文件直到您感兴趣的行,因为每行的长度都可以不同。不过,您可以避免的一件事是将整个文件加载到一个大数组中。只需逐行阅读,数数,然后丢弃它们,直到找到您想要的内容。很像你自己的例子。你可以让它变得更加Rubyish。

附言。铁皮人的评论让我做了一些实验。虽然我没有找到任何原因drop加载整个文件,但确实存在一个问题:drop返回文件的其余部分一个数组。这是可以避免这种情况的方法:

file.lines.select.with_index{|l,i| (501..600) === i}

PS2:Doh,上面的代码,虽然没有创建一个巨大的数组,但会迭代整个文件,甚至是 600 以下的行。:( 这是第三个版本:

enum = file.lines
500.times{enum.next} # skip 500
enum.take(100) # take the next 100

或者,如果您更喜欢 FP:

file.lines.tap{|enum| 500.times{enum.next}}.take(100)

无论如何,这个独白的好处是你可以学习多种方法来迭代文件;)

file.lines.drop(500).take(100) # will get you lines 501-600

Generally, you can't avoid reading file from the start until the line you are interested in, as each line can be of different length. The one thing you can avoid, though, is loading whole file into a big array. Just read line by line, counting, and discard them until you reach what you look for. Pretty much like your own example. You can just make it more Rubyish.

PS. the Tin Man's comment made me do some experimenting. While I didn't find any reason why would drop load whole file, there is indeed a problem: drop returns the rest of the file in an array. Here's a way this could be avoided:

file.lines.select.with_index{|l,i| (501..600) === i}

PS2: Doh, above code, while not making a huge array, iterates through the whole file, even the lines below 600. :( Here's a third version:

enum = file.lines
500.times{enum.next} # skip 500
enum.take(100) # take the next 100

or, if you prefer FP:

file.lines.tap{|enum| 500.times{enum.next}}.take(100)

Anyway, the good point of this monologue is that you can learn multiple ways to iterate a file. ;)

╄→承喏 2024-10-25 05:16:22

我不知道是否有等效的方法可以对行执行此操作,但您可以在 IO 对象上使用 seekoffset 参数来“跳过”字节。

请参阅 IO#seek,或参阅 IO#open 有关偏移参数的信息。

I don't know if there is an equivalent way of doing this for lines, but you can use seek or the offset argument on an IO object to "skip" bytes.

See IO#seek, or see IO#open for information on the offset argument.

木森分化 2024-10-25 05:16:22

听起来 rio 可能会有所帮助。它为您提供了一个 lines() 方法。

Sounds like rio might be of help here. It provides you with a lines() method.

晒暮凉 2024-10-25 05:16:22

您可以使用 IO#readlines,它返回一个包含所有行的数组

IO.readlines(file_in)[500..600].each do |line| 
  #line is each line in the file (including the last \n)
  #stuff
end

f = File.new(file_in)
f.readlines[500..600].each do |line| 
  #line is each line in the file (including the last \n)
  #stuff
end

You can use IO#readlines, that returns an array with all the lines

IO.readlines(file_in)[500..600].each do |line| 
  #line is each line in the file (including the last \n)
  #stuff
end

or

f = File.new(file_in)
f.readlines[500..600].each do |line| 
  #line is each line in the file (including the last \n)
  #stuff
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文