如何使用滑动窗口过滤器包装 ruby​​ IO

发布于 2024-11-16 02:08:20 字数 537 浏览 2 评论 0原文

我在一些 ruby​​ 代码中使用不透明的 API,它采用文件/IO 作为参数。我希望能够向它传递一个 IO 对象,该对象仅允许访问真实 IO 对象中给定范围的数据。

例如,我有一个 8GB 的​​文件,我想给 api 一个 IO 对象,该对象在我的真实文件中间有 1GB 的范围。

real_file = File.new('my-big-file')
offset = 1 * 2**30 # start 1 GB into it
length = 1 * 2**30 # end 1 GB after start
filter = IOFilter.new(real_file, offset, length)

# The api only sees the 1GB of data in the middle
opaque_api(filter)

filter_io 项目看起来是最容易适应这样做的,但似乎不支持直接这个用例。

I'm using an opaque API in some ruby code which takes a File/IO as a parameter. I want to be able to pass it an IO object that only gives access to a given range of data in the real IO object.

For example, I have a 8GB file, and I want to give the api an IO object that has a 1GB range within the middle of my real file.

real_file = File.new('my-big-file')
offset = 1 * 2**30 # start 1 GB into it
length = 1 * 2**30 # end 1 GB after start
filter = IOFilter.new(real_file, offset, length)

# The api only sees the 1GB of data in the middle
opaque_api(filter)

The filter_io project looks like it would be the easiest to adapt to do this, but doesn't seem to support this use case directly.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

你与昨日 2024-11-23 02:08:20

我认为你必须自己编写它,因为它似乎是一个相当具体的事情:你必须使用一大块来实现 IO 的所有方法(或者你需要的子集)打开的文件作为数据源。 “特殊性”的一个例子是写入这样的流 - 您必须注意不要跨越给定段的边界,即不断跟踪您在大文件中的当前位置。这似乎不是一项微不足道的工作,而且我没有看到任何可以帮助您的捷径。

也许您可以找到一些基于操作系统的解决方案,例如从大文件的一部分中制作环回设备(请参阅man losetup,特别是 -o--sizelimit 选项,例如)。

变体 2:

如果您愿意始终将窗口的内容保留在内存中,则可以将 StringIO 像这样(只是一个草图,未经测试):

def sliding_io filename, offset, length
  File.open(filename, 'r+') do |f|
    # read the window into a buffer
    f.seek(offset)
    buf = f.read(length)
    # wrap a buffer into StringIO and pass it given block
    StringIO.open(buf) do |buf_io|
      yield(buf_io)
    end
    # write altered buffer back to the big file
    f.seek(offset)
    f.write(buf[0,length])
  end
end

并像使用 IO#open< 的块变体一样使用它/代码>。

I think you would have to write it yourself, as it seems like a rather specific thing: you would have to implement all (or, a subset that you need) of IO's methods using a chunk of the opened file as a data source. An example of the "speciality" would be writing to such stream - you would have to take care not to cross the boundary of the segment given, i.e. constantly keeping track of your current position in the big file. Doesn't seem like a trivial job, and I don't see any shortcuts that could help you there.

Perhaps you can find some OS-based solution, e.g. making a loopback device out of the part of the large file (see man losetup and particularly -o and --sizelimit options, for example).

Variant 2:

If you are ok with keeping the contents of the window in memory all the time, you may wrap StringIO like this (just a sketch, not tested):

def sliding_io filename, offset, length
  File.open(filename, 'r+') do |f|
    # read the window into a buffer
    f.seek(offset)
    buf = f.read(length)
    # wrap a buffer into StringIO and pass it given block
    StringIO.open(buf) do |buf_io|
      yield(buf_io)
    end
    # write altered buffer back to the big file
    f.seek(offset)
    f.write(buf[0,length])
  end
end

And use it as you would use block variant of IO#open.

清风不识月 2024-11-23 02:08:20

我相信 IO 对象具有您正在寻找的功能。我之前用过它来对类似大小的文件进行 MD5 哈希求和。

incr_digest = Digest::MD5.new()
file = File.open(filename, 'rb') do |io|
    while chunk = io.read(50000)
        incr_digest << chunk
    end
end

这是我使用的块,我将块传递给 MD5 Digest 对象。

http://www.ruby-doc.org/core/classes/IO .html#M000918

I believe the IO object has the functionality you are looking for. I've used it before for MD5 hash summing similarly sized files.

incr_digest = Digest::MD5.new()
file = File.open(filename, 'rb') do |io|
    while chunk = io.read(50000)
        incr_digest << chunk
    end
end

This was the block I used, where I was passing the chunk to the MD5 Digest object.

http://www.ruby-doc.org/core/classes/IO.html#M000918

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文