我在哪里可以了解有关 Google 搜索“您的意思是”的更多信息吗?算法?
可能的重复:
如何实现“您是说”吗? < /p>
我我正在编写一个应用程序,我需要类似于谷歌的“你是说吗?”的功能。他们的搜索引擎使用的功能:
是否有可用于此类内容的源代码,或者我在哪里可以找到可以使用的文章帮我建立自己的?
Possible Duplicate:
How do you implement a “Did you mean”?
I am writing an application where I require functionality similar to Google's "did you mean?" feature used by their search engine:
Is there source code available for such a thing or where can I find articles that would help me to build my own?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
您应该查看 Peter Norvigs 的文章,了解如何用几行 Python 代码实现拼写检查器:
如何编写拼写校正器 它还有其他语言(即 C#)实现的链接
You should check out Peter Norvigs article about implementing the spell checker in a few lines of python:
How to Write a Spelling Corrector It also has links for implementations in other languages (i.e. C#)
一年半前,我参加了一位谷歌工程师举办的研讨会,他们在会上谈论了他们的解决方法。演讲者说他们的算法(至少部分)根本没有智能;相反,利用他们可以访问的大量数据。他们确定,如果有人搜索“Brittany Spears”,没有点击任何内容,然后再次搜索“Britney Spears”,并点击某些内容,我们就可以对他们搜索的内容有一个合理的猜测,并可以建议未来。
免责声明:这可能只是他们算法的一部分
I attended a seminar by a Google engineer a year and a half ago, where they talked about their approach to this. The presenter was saying that (at least part of) their algorithm has little intelligence at all; but rather, utilises the huge amounts of data they have access to. They determined that if someone searches for "Brittany Speares", clicks on nothing, and then does another search for "Britney Spears", and clicks on something, we can have a fair guess about what they were searching for, and can suggest that in future.
Disclaimer: This may have just been part of their algorithm
Python 有一个名为
difflib
的模块。它提供了一个名为get_close_matches
的功能。来自 Python 文档:这个图书馆可以帮助你吗?
Python has a module called
difflib
. It provides a functionality calledget_close_matches
. From the Python Documentation:Could this library help you?
您可以使用 http://developer.yahoo.com/search/web/V1 /spellingSuggestion.html 这将提供类似的功能。
You can use http://developer.yahoo.com/search/web/V1/spellingSuggestion.html which would give a similar functionality.
您可以查看提供此功能的 Xapian 源代码,就像许多其他搜索库一样。 http://xapian.org/
You can check out the source code for Xapian which provides this functionality, as do a lot of other search libraries. http://xapian.org/
我不确定它是否符合您的目的,但带有字典的字符串编辑距离算法可能足以满足小型应用程序的需求。
I am not sure if it serves your purpose but a String Edit distance Algorithm with a dictionary might suffice for a small Application.
我想看一下这篇关于 Google 轰炸的文章。它表明它只是根据先前输入的结果建议答案。
I'd take a look at this article on google bombing. It shows that it just suggests answers based off previously entered results.
AFAIK“你是说吗?”功能不检查拼写。它只是根据谷歌解析的内容为您提供另一个查询。
AFAIK the "did you mean ?" feature doesn't check the spelling. It only gives you another query based on the content parsed by google.
有关此主题的精彩章节可以在公开的 信息检索简介。
A great chapter to this topic can be found in the openly available Introduction to Information Retrieval.
您可以使用 ngram 进行比较: http://en.wikipedia.org/wiki/N- gram
使用 python ngram 模块: http://packages.python.org/ngram/ index.html
得到:
U could use ngram for the comparisment: http://en.wikipedia.org/wiki/N-gram
Using python ngram module: http://packages.python.org/ngram/index.html
U get:
看看 Levenshtein-Automata
take a look at Levenshtein-Automata