个人知识库文件的搜索索引工具
我有大量的基本文本、rtf、html、pdf 和 chm 文件,我将它们存储在 USB 密钥上作为个人知识库。
到目前为止,为了检索信息,我使用了标准文件搜索工具(Windows 搜索、grep 等)。 然而,如今,由于数据量巨大,暴力搜索可能需要几分钟的时间。 此外,PDF 和 CHM 也更难搜索。
因此,我正在寻找一种在这种情况下可以很好地工作的文本索引工具。 我想避免对 RDBMS(即 SQL Server、MySQL)的依赖,因为我会在许多不同的计算机上使用它并且不希望安装麻烦。 便携式工具将是理想的选择。 在某些机器上我也经常无法访问互联网。
如果能提供一个简单的 GUI,允许查询输入和快速访问结果,那就太好了。
我想过自己写这个,但是这比我现在有时间做的工作要多一些。
I have a large number of basic text, rtf, html, pdf and chm files that I store on a USB key as a personal knowledge base.
Up until now, to retrieve information, I've used a standard file searching tools (windows search,grep etc). However these days a brute force search can take minutes due to sheer data size. Also PDF and CHM are also more difficult to search.
Therefore I'm looking for a text indexing tool that will work well in this situation. I want to avoid a dependency on an RDBMS (ie SQL Server, MySQL) as I would be using it on many different computers and do not want installation hassles. A portable tool would be ideal. On some machines I will also often be without internet access.
Something that provides a simple GUI allowing query input and quick access to results would be great.
I've thought about writing this myself, however it's a bit more work then I have time for right now.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Google 桌面 会为您建立索引,Windows 桌面搜索(在 Windows 中)。 Beagle 是一个很棒的 Linux 搜索工具。
Google Desktop does this indexing for you, as does the Windows Desktop Search (in Windows). Beagle is a great Linux search tool.
如果你喜欢一点修补,我会使用 Lucene - 要么是纯 java 版本,要么获取 https: //lucene.apache.org/.
这是一个全文索引器和搜索库,非常适合通过 USB 运行。
If you fancy a bit of a tinker, I'd use Lucene - either the pure java version or grab a copy of https://lucene.apache.org/.
This is a full text indexer and search library , would be perfect for running off the usb.