Ruby:使用 csv 作为数据库
我想我第一次可能没有很好地解释我的问题。
我想打开一堆文本和二进制文件,并使用正则表达式扫描这些文件。我需要从 csv 中获取第二列中的数据,即所有文件的路径,作为指向要打开的文件的方法。
打开文件并扫描文件中的正则表达式后,如果它匹配任何内容,则会显示在屏幕上。对于造成的混乱,我深表歉意,并非常感谢您所做的一切! –
您好,
很抱歉问了一个可能很简单的问题。我是红宝石新手,非常感谢任何指导。
我正在尝试使用 csv 文件作为索引来利用其他操作。
特别是,我有一个 csv 文件,如下所示:
id, file, description, date
1, /dir_a/file1, this is the first file, 02/10/11
2, /dir_b/file2, this is the second file, 02/11/11
我想打开“文件”列中定义的每个文件并搜索正则表达式。
我知道您可以使用 CSV 类定义每列中的标题
require 'rubygems'
require 'csv'
require 'pp'
index = CSV.read("files.csv", :headers => true)
index.each do |row|
puts row ['file']
end
我知道如何创建一个循环来打开每个文件并在每个文件中搜索正则表达式,如果有,则显示它:
regex = /[0-9A-Za-z]{8,8}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{12,12}/
Dir.glob('/home/Bob/**/*').each do |file|
next unless File.file?(file)
File.open(file, "rb") do |f|
f.each_line do |line|
f.each_line do |line|
unless (pattern = line.scan(regex)).empty?
puts "#{pattern}"
end
end
end
end
end
有没有一种方法可以使用 csv 文件中第二列的内容作为变量来打开每个文件,搜索正则表达式,如果文件中存在匹配项,则将 csv 中与新 csv 匹配的行输出?
先感谢您!!!!
I think I may not have done a good enough job explaining my question the first time.
I want to open a bunch of text, and binary files and scan those files with my regular expression. What I need from the csv is to take the data in the second column, which are the paths to all the files, as the means to point to which file to open.
Once the file is opened and the regexp is scanned thru the file, if it matches anything, it displays to the screen. I am sorry for the confusion and thank you so much for everything! –
Hello,
I am sorry for asking what is probably a simple question. I am new to ruby and will appreciate any guidance.
I am trying to use a csv file as an index to leverage other actions.
In particular, I have a csv file that looks like:
id, file, description, date
1, /dir_a/file1, this is the first file, 02/10/11
2, /dir_b/file2, this is the second file, 02/11/11
I want to open every file defined in the "file" column and search for a regular expression.
I know that you can define the headers in each column with the CSV class
require 'rubygems'
require 'csv'
require 'pp'
index = CSV.read("files.csv", :headers => true)
index.each do |row|
puts row ['file']
end
I know how to create a loop that opens every file and search's for a regexp in each file, and if there is one, displays it:
regex = /[0-9A-Za-z]{8,8}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{12,12}/
Dir.glob('/home/Bob/**/*').each do |file|
next unless File.file?(file)
File.open(file, "rb") do |f|
f.each_line do |line|
f.each_line do |line|
unless (pattern = line.scan(regex)).empty?
puts "#{pattern}"
end
end
end
end
end
Is there a way I can use the contents of the second column in my csv file as my variable to open each of the files, search the regexp and if there is a match in the file, output the the row in the csv that had the match to a new csv?
Thank you in advance!!!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
乍一看,您似乎可以将其简化为:
CSV 文件不应该是二进制的,因此您可以在打开文件时删除“rb”,让我们将读取的文件减少到
foreach
,它迭代文件,逐行返回它。根据您的示例代码,目录层次结构中文件的深度存在问题。目前还不清楚那里发生了什么。
编辑:
在你的问题中你说:
根据规范:
它接着说:
因此,如果您看到的是二进制数据,则不应看到,因为根据规范它不是 CSV。不幸的是,多年来该规范已被滥用,因此您可能会在文件中看到二进制数据。如果是这样,请继续使用“rb”作为文件模式,但要谨慎行事。
要问的一个重要问题是您是否可以使用 Ruby 的 CSV 库读取该文件,这使得很多讨论变得毫无意义。
At a quick glance it looks like you could reduce it to:
A CSV file shouldn't be binary, so you can drop the 'rb' when opening the file, letting us reduce the file read to
foreach
, which iterates over the file, returning it line by line.The depth of the files in your directory hierarchy is in question based on your sample code. It's not real clear what's going on there.
EDIT:
In your question you said:
According to the spec:
It goes on to say:
So, if you're seeing binary data you shouldn't because it's not CSV according to the spec. Unfortunately the spec has been abused over the years, so it's possible you are seeing binary data in the file. If so, continue to use 'rb' as the file mode but do it cautiously.
An important question to ask is whether you can read the file using Ruby's CSV library, which makes a lot of this a moot discussion.