RAILS3:全文搜索 Word 文档?
我的公司收集了大约 3500 个高度结构化的 Word 文档(并且还在不断增加),其中包含来自我们的一款产品的多项选择题。我的任务是编写一个前端,让人们在其他产品中找到并使用它们。它们的一些元数据将进入数据库,但我们也希望进行全文搜索。
我可以选择使用 MS Access(因为我很了解它)或 Rails(因为我应该学习它)作为前端。我已经完成了一款 Rails 应用程序并且更愿意继续使用它。
我认为,与其将文档加载到数据库中,不如将它们放在文件系统上并将它们的路径存储在数据库中更有意义。
我知道我可以使用 Ferret 来搜索数据库字段,但是向 Rails 应用程序添加全文搜索以查找文件系统上的一堆文件的最佳方法是什么?
My company has a collection of about 3500 highly-structured Word docs (and growing) that contain multiple choice questions from one of our products. I've been tasked with writing a front-end that will let people find and use these in other products. There is some metadata on them that would go in a database, but we'd also like full-text search.
I've been given the option of using for the front-end either MS Access (because I know it well) or Rails (because I'm supposed to be learning it). I've done one Rails app and prefer to continue with it.
Rather than load the documents into the database, I thought it made more sense to just have them on the file system and store paths to them in the database.
I know I can use Ferret to search database fields but what's the best way to add full-text searching to a Rails app for a pile of files on the filesystem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不确定是否有任何宝石可以为您搜索单词文件。尽管您已经提到您不想将整个文档加载到数据库中,但您可能会考虑仅复制数据库中每个文件的文本内容。您可以使用 win32ol 库来执行此操作(http://ruby-doc.org/stdlib/libdoc/win32ole/rdoc/classes/WIN32OLE.html)..如果我必须实现这个,我会每晚运行一个 cron 作业(或任何看起来合适的频率)这将根据单词文件中的更改刷新数据库内容。
Not sure if there are any gems that would search the word files for you. Although you have mentioned that you do not want to load the entire documents into the database, you might look into just copying the text contents of each file in your db. You can use win32ol library for doing this (http://ruby-doc.org/stdlib/libdoc/win32ole/rdoc/classes/WIN32OLE.html) .. If I had to implement this, I would run a cron job every night (or whatever frequency seems fit) that would refresh the database content with the changes in the word files.