您能推荐一个全文搜索引擎吗? (最好是开源的)
我有一个包含许多(虽然相对较短)HTML 文档的数据库。我希望用户能够通过在我的 C++ 桌面应用程序中输入一个或多个搜索词来搜索该数据库。因此,我正在寻找一种快速的全文搜索解决方案来与我的应用程序集成。理想情况下,它应该:
- 跳过常用词,例如
the
、of
、and
等
- 支持词干提取,即搜索
run
还可以查找包含 runner
、running
和 ran
的文档。
- 当新文档添加到数据库时,能够在后台更新其索引。
- 能够提供搜索词建议(如 Google Suggest)
- 有一个记录良好的 API
为了说明这一点,假设数据库只有两个文档:
文档 1:这是文本搜索的测试。
文档 2:测试很有趣。
以下单词应该位于索引中:fun
、search
、test
、testing
、text
。如果用户在搜索框中键入 t
,我希望应用程序能够建议 test
、testing
和 text (理想情况下,应用程序应该能够向搜索引擎查询以 t
开头的 10 个最常见的搜索词)。搜索 testing
应返回两个文档。
其他要点:
- 我不需要多用户支持
- 我不需要复杂查询的支持
- 数据库驻留在用户的计算机上,因此索引应该在本地执行。
您能建议一个基于 C 或 C++ 的解决方案吗? (我简要回顾了 CLucene 和 Xapian,但我不确定是否可以满足我的需求,特别是查询建议功能的搜索词索引)。
Can you recommend a full-text search engine? (Preferably open source)
I have a database of many (though relatively short) HTML documents. I want users to be able to search this database by entering one or more search words in my C++ desktop application. Hence, I’m looking for a fast full-text search solution to integrate with my app. Ideally, it should:
- Skip common words, such as
the
, of
, and
, etc.
- Support stemming, i.e. search for
run
also finds documents containing runner
, running
and ran
.
- Be able to update its index in the background as new documents are added to the database.
- Be able to provide search word suggestions (like Google Suggest)
- Have a well-documented API
To illustrate, assume the database has just two documents:
Document 1: This is a test of text search.
Document 2: Testing is fun.
The following words should be in the index: fun
, search
, test
, testing
, text
. If the user types t
in the search box, I want the application to be able to suggest test
, testing
and text
(Ideally, the application should be able to query the search engine for the 10 most common search words starting with t
). A search for testing
should return both documents.
Other points:
- I don't need multi-user support
- I don't need support for complex queries
- The database resides on the user's computer, so the indexing should be performed locally.
Can you suggest a C or C++ based solution? (I’ve briefly reviewed CLucene and Xapian, but I’m not sure if either will address my needs, especially querying the search word indexes for the suggest feature).
发布评论
评论(3)
另请查看 Sphinx
Also check out Sphinx
您可以将 Clucene 用于 c/c++,将 spider 用于 php。两者都是免费的,但需要时间来设置和使用,但并不难理解。
You can use Clucene for c/c++ and sphider for php. both are free but take time to setup and use, but not difficult to understand.
我非常成功地使用了 dtSearch 模块。
他们有一个 dll,您可以将其与您的应用程序一起使用来索引任何内容,并执行比您要求的更多的操作。
注意:不是免费的。
我没有看到你要求免费的一份,所以我写了我最喜欢的一份。
dtSearch 启发了我,我为我的网站创建了一个我的语言 Ellinika 索引器,因为没有找到我正在寻找的我的语言。
如果您只需要为您的单词找到建议,有一些模块只是用于 steeming,我从这里获得了参考 http://tartarus.org/~martin/PorterStemmer/
例如,如果您有一个像 ms sql 这样的数据库,它已经准备好做一些基本索引,并且有人搜索一个单词,但您什么也没找到,你可以通过你的自我定位这个词,然后再次搜索......
I have use with very success the dtSearch module.
They have a dll, that you can use with your application to index just anything and do more than the one you ask.
Note: Is not free.
I do not see in question that you ask for free one, so I write my favor one.
The dtSearch have inspire me and I create an indexer for my language Ellinika for my sites, because did not found what I was looking for my language.
There are some modules just for steeming if you just need to find suggestions for your words, I have get reference from here http://tartarus.org/~martin/PorterStemmer/
For example if you have a database like ms sql that all ready do some basic indexing, and some one search for a word, and you do not find nothing, you can do by your self steeming on this word, and search again...