对压缩文件调用 File.readlines (或等效方法),而不先解压缩它们
我正在尝试编写一个 rake 任务来扫描一些压缩的日志文件并提取一些数据,然后我可以使用这些数据来更新一些数据库记录。由于我们服务器上的空间限制,我无法解压所有文件并仅对它们调用 File.readlines,这是我通常会做的。
任何人都可以建议一种不涉及解压缩文件的替代方法吗?我需要做的基本上是扫描与特定正则表达式匹配的行,然后从匹配行之后的行中提取一些文本。它们都是 gzip 压缩的,因此是 .gz 文件。
I'm trying to write a rake task to scan through some zipped up log files and extract some data that i can then use to update some database records. Because of space limitations on our server i can't unzip all of the files and just call File.readlines on them which is what i would normally do.
Can anyone suggest an alternative approach that doesn't involve unzipping the files? What i need to do is basically scan for lines matching a particular regex, then extract some text from the line AFTER the matched line. They are all gzipped, and are therefore .gz files.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用 Ruby 打开 gzip 压缩文件,然后像正常打开文件一样执行搜索,如下所示:
这会将整个文件加载到内存中,请注意这一点。
You can open the gzipped file using Ruby, and perform your searching as if you had opened the file as normal, like this:
This will load the entire file in memory, beware of that.