搜索词建议

发布于 2024-09-16 11:10:46 字数 1129 浏览 6 评论 0原文

以前已经以各种方式提出过这个问题,但我想知道有自动搜索词建议经验的人是否可以提供有关最有用和最有效的方法的建议。场景如下:

我刚刚开始创建一个网站,该书是一本术语词典(大约 1,000 个条目,平均有 300 个单词的解释),其中许多内容相当晦涩,并且可能有许多访问者网站不知道如何拼写这些单词。出版商希望为每个条目提供全文搜索。所以,我希望实现一个具有拼写纠正功能的搜索引擎。主站点可能会在 PHP 框架(或者可能是 Django)和 MySQL 数据库中完成。

在该领域有经验的任何人都可以就以下问题提供建议:

  • 有了这种性质的集合语料库,我应该使用类似 的东西吗? LuceneSphinx 用于搜索引擎?
  • 据我所知,这两个都没有内置的建议功能。所以看来我需要整合以下一项或多项。有哪些优点/缺点:

我担心我的语料库,并且不希望 Google 开始建议与本书无关的内容。我也不确定是否应该尝试使用变音位比较和编辑比较,或者其他一些技术组合来捕获拼写错误和语音拼写尝试。

This question has been asked in various ways before, but I'm wondering if people who have experience with automatic search term suggestion could offer advice on the most useful and efficient approaches. Here's the scenario:

I'm just starting on a website for a book that is a dictionary of terms (roughly 1,000 entries, with 300 word explanations on average), many of which are fairly obscure, and it is likely that many visitors to the site would not know how to spell the words. The publisher wants to make full-text search available for every entry. So, I'm hoping to implement a search engine with spelling correction. The main site will probably be done in a PHP framework (or possibly Django) with a MySQL database.

Can anyone with experience in this area give advice on the following:

  • With a set corpus of this nature, should I be using something like Lucene or Sphinx for the search engine?
  • As far as I can tell, neither of these has a built-in suggestion function. So it seems I will need to integrate one or more of the following. What are the advantages / disadvantages of:

I'm concerned about the specificity of my corpus, and don't want Google to start suggesting things that have nothing to do with this book. I'm also not sure whether I should try to use both a metaphone comparison and a Levenshtein comparison, or some other combination of techniques to capture both typos and attempts at phonetic spelling.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

岁月静好 2024-09-23 11:10:46

您可能需要考虑 Apache Solr,它是 Lucene 的 Web 服务封装,并在J2EE 容器,如 Tomcat。您将获得术语建议、拼写检查、移植、词干提取等。真的非常非常好。

有关查询相关功能的完整列表,请参阅此处

DjangoSolr 的 PHP 库。

无论如何,我不建议对这样一个专门的语料库使用 Google Suggest,并且使用 Solr,您将不需要它。

希望这有帮助。

You might want to consider Apache Solr, which is a web service encapsulation of Lucene, and runs in a J2EE container like Tomcat. You'll get term suggestion, spell check, porting, stemming and much more. It's really very nice.

See here for a full listing of its features relating to queries.

There are Django and PHP libraries for Solr.

I wouldn't recommend using Google Suggest for such a specialised corpus anyway, and with Solr you won't need it.

Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文