如何将 Powerpoint 和 Excel 文档放入全文搜索索引(如 Sphinx 或 PostgreSQL 文本搜索)中?
我有一个 Rails 应用程序,它接受任意业务文档(例如 Word、Excel、Powerpoint 和 PDF)的文件上传。我需要使所有这些文档都可搜索,最好使用 Sphinx 或 PostgreSQL 全文搜索。最好的解决方案是什么?
I have a Rails application that accepts file uploads of arbitrary business documents such as from Word, Excel, Powerpoint, and PDF. I need to make all these documents searchable, preferably using Sphinx or PostgreSQL full text search. What are the best solutions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如评论中指出的,一个较旧的问题很好地涵盖了这一点。
简而言之:您必须将从这些文件中提取的相关数据存储在 Sphinx 数据库中,也可能用于 PostgreSQL 全文搜索。 Sphinx 现在还可以理解纯文本文件(只要数据库列指向文件),但这仍然需要另一个从 PDF、DOC、XLS 等文件中提取数据的工具。
As pointed out in the comments, this is covered pretty well by an older question.
In short: you're going to have to store the relevant extracted data from those files in the database for Sphinx, and likely for PostgreSQL full-text search as well. Sphinx can now also understand plain text files (as long as a database column points to a file), but that will still involve another tool extracting data from PDF, DOC, XLS et al.