Ruby:使用 csv 作为数据库

发布于 2024-10-17 18:46:33 字数 1251 浏览 0 评论 0原文

我想我第一次可能没有很好地解释我的问题。

我想打开一堆文本和二进制文件,并使用正则表达式扫描这些文件。我需要从 csv 中获取第二列中的数据,即所有文件的路径,作为指向要打开的文件的方法。

打开文件并扫描文件中的正则表达式后,如果它匹配任何内容,则会显示在屏幕上。对于造成的混乱,我深表歉意,并非常感谢您所做的一切! –

您好,

很抱歉问了一个可能很简单的问题。我是红宝石新手,非常感谢任何指导。

我正在尝试使用 csv 文件作为索引来利用其他操作。

特别是,我有一个 csv 文件,如下所示:

 id, file, description, date
 1, /dir_a/file1, this is the first file, 02/10/11
 2, /dir_b/file2, this is the second file, 02/11/11

我想打开“文件”列中定义的每个文件并搜索正则表达式。

我知道您可以使用 CSV 类定义每列中的标题

require 'rubygems'
require 'csv'
require 'pp'

index = CSV.read("files.csv", :headers => true)

index.each do |row|
  puts row ['file']
end

我知道如何创建一个循环来打开每个文件并在每个文件中搜索正则表达式,如果有,则显示它:

regex = /[0-9A-Za-z]{8,8}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{12,12}/

Dir.glob('/home/Bob/**/*').each do |file|
  next unless File.file?(file)
  File.open(file, "rb") do |f|
    f.each_line do |line|
      f.each_line do |line|
        unless (pattern = line.scan(regex)).empty?
          puts "#{pattern}"
        end
      end
    end
  end
end

有没有一种方法可以使用 csv 文件中第二列的内容作为变量来打开每个文件,搜索正则表达式,如果文件中存在匹配项,则将 csv 中与新 csv 匹配的行输出?

先感谢您!!!!

I think I may not have done a good enough job explaining my question the first time.

I want to open a bunch of text, and binary files and scan those files with my regular expression. What I need from the csv is to take the data in the second column, which are the paths to all the files, as the means to point to which file to open.

Once the file is opened and the regexp is scanned thru the file, if it matches anything, it displays to the screen. I am sorry for the confusion and thank you so much for everything! –

Hello,

I am sorry for asking what is probably a simple question. I am new to ruby and will appreciate any guidance.

I am trying to use a csv file as an index to leverage other actions.

In particular, I have a csv file that looks like:

 id, file, description, date
 1, /dir_a/file1, this is the first file, 02/10/11
 2, /dir_b/file2, this is the second file, 02/11/11

I want to open every file defined in the "file" column and search for a regular expression.

I know that you can define the headers in each column with the CSV class

require 'rubygems'
require 'csv'
require 'pp'

index = CSV.read("files.csv", :headers => true)

index.each do |row|
  puts row ['file']
end

I know how to create a loop that opens every file and search's for a regexp in each file, and if there is one, displays it:

regex = /[0-9A-Za-z]{8,8}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{12,12}/

Dir.glob('/home/Bob/**/*').each do |file|
  next unless File.file?(file)
  File.open(file, "rb") do |f|
    f.each_line do |line|
      f.each_line do |line|
        unless (pattern = line.scan(regex)).empty?
          puts "#{pattern}"
        end
      end
    end
  end
end

Is there a way I can use the contents of the second column in my csv file as my variable to open each of the files, search the regexp and if there is a match in the file, output the the row in the csv that had the match to a new csv?

Thank you in advance!!!!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

浮生面具三千个 2024-10-24 18:46:33

乍一看,您似乎可以将其简化为:

index.each do |row|
  File.foreach(row['file']) do |line|
    puts "#{pattern}" if (line[regex])
  end
end

CSV 文件不应该是二进制的,因此您可以在打开文件时删除“rb”,让我们将读取的文件减少到 foreach ,它迭代文件,逐行返回它。

根据您的示例代码,目录层次结构中文件的深度存在问题。目前还不清楚那里发生了什么。


编辑:

它告诉我“regex”是一个未定义的变量

在你的问题中你说:

regex = /[0-9A-Za-z]{8,8}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{12,12}/

我打开进行搜索的文件可能是二进制文件。

根据规范:

CSV 的常见用法是 US-ASCII,但 IANA 为“文本”树定义的其他字符集可以与“charset”参数结合使用。

它接着说:

安全考虑:

CSV 文件包含被动文本数据,不应造成任何影响
风险。然而,理论上恶意二进制文件是有可能的
可以包含数据以利用潜在的缓冲区溢出
在处理 CSV 数据的程序中。此外,私人数据
可以通过这种格式共享(当然适用于任何文本
数据)。

因此,如果您看到的是二进制数据,则不应看到,因为根据规范它不是 CSV。不幸的是,多年来该规范已被滥用,因此您可能会在文件中看到二进制数据。如果是这样,请继续使用“rb”作为文件模式,但要谨慎行事。

要问的一个重要问题是您是否可以使用 Ruby 的 CSV 库读取该文件,这使得很多讨论变得毫无意义。

At a quick glance it looks like you could reduce it to:

index.each do |row|
  File.foreach(row['file']) do |line|
    puts "#{pattern}" if (line[regex])
  end
end

A CSV file shouldn't be binary, so you can drop the 'rb' when opening the file, letting us reduce the file read to foreach, which iterates over the file, returning it line by line.

The depth of the files in your directory hierarchy is in question based on your sample code. It's not real clear what's going on there.


EDIT:

it tells me that "regex" is an undefined variable

In your question you said:

regex = /[0-9A-Za-z]{8,8}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{12,12}/

the files I open to do the search on may be a binary.

According to the spec:

Common usage of CSV is US-ASCII, but other character sets defined by IANA for the "text" tree may be used in conjunction with the "charset" parameter.

It goes on to say:

Security considerations:

CSV files contain passive text data that should not pose any
risks. However, it is possible in theory that malicious binary
data may be included in order to exploit potential buffer overruns
in the program processing CSV data. Additionally, private data
may be shared via this format (which of course applies to any text
data).

So, if you're seeing binary data you shouldn't because it's not CSV according to the spec. Unfortunately the spec has been abused over the years, so it's possible you are seeing binary data in the file. If so, continue to use 'rb' as the file mode but do it cautiously.

An important question to ask is whether you can read the file using Ruby's CSV library, which makes a lot of this a moot discussion.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文