我在哪里可以了解有关 Google 搜索“您的意思是”的更多信息吗?算法?

发布于 2024-09-24 08:25:08 字数 355 浏览 1 评论 0原文

可能的重复:
如何实现“您是说”吗? < /p>

我我正在编写一个应用程序,我需要类似于谷歌的“你是说吗?”的功能。他们的搜索引擎使用的功能:

alt text

是否有可用于此类内容的源代码,或者我在哪里可以找到可以使用的文章帮我建立自己的?

Possible Duplicate:
How do you implement a “Did you mean”?

I am writing an application where I require functionality similar to Google's "did you mean?" feature used by their search engine:

alt text

Is there source code available for such a thing or where can I find articles that would help me to build my own?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

折戟 2024-10-01 08:25:08

您应该查看 Peter Norvigs 的文章,了解如何用几行 Python 代码实现拼写检查器:
如何编写拼写校正器 它还有其他语言(即 C#)实现的链接

You should check out Peter Norvigs article about implementing the spell checker in a few lines of python:
How to Write a Spelling Corrector It also has links for implementations in other languages (i.e. C#)

晨曦慕雪 2024-10-01 08:25:08

一年半前,我参加了一位谷歌工程师举办的研讨会,他们在会上谈论了他们的解决方法。演讲者说他们的算法(至少部分)根本没有智能;相反,利用他们可以访问的大量数据。他们确定,如果有人搜索“Brittany Spears”,没有点击任何内容,然后再次搜索“Britney Spears”,并点击某些内容,我们就可以对他们搜索的内容有一个合理的猜测,并可以建议未来。

免责声明:这可能只是他们算法的一部分

I attended a seminar by a Google engineer a year and a half ago, where they talked about their approach to this. The presenter was saying that (at least part of) their algorithm has little intelligence at all; but rather, utilises the huge amounts of data they have access to. They determined that if someone searches for "Brittany Speares", clicks on nothing, and then does another search for "Britney Spears", and clicks on something, we can have a fair guess about what they were searching for, and can suggest that in future.

Disclaimer: This may have just been part of their algorithm

苦笑流年记忆 2024-10-01 08:25:08

Python 有一个名为 difflib 的模块。它提供了一个名为 get_close_matches 的功能。来自 Python 文档:

get_close_matches(word,possibility[,n][,cutoff])

返回最佳“好”的列表
“足够”匹配。单词是一个序列
需要紧密匹配的
(通常是一个字符串),以及
可能性是要匹配的序列列表
单词(通常是字符串列表)。

可选参数n(默认
3) 是最大关闭次数
匹配返回; n 必须是
大于0

可选参数截止(默认
0.6) 是 [0,
1]。不得分的可能性
至少与单词相似的是
被忽略。

最佳匹配(不超过n
返回的可能性之一
在列表中,按相似度排序
得分,最相似的优先。

  >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
  ['apple', 'ape']
  >>> import keyword
  >>> get_close_matches('wheel', keyword.kwlist)
  ['while']
  >>> get_close_matches('apple', keyword.kwlist)
  []
  >>> get_close_matches('accept', keyword.kwlist)
  ['except']

这个图书馆可以帮助你吗?

Python has a module called difflib. It provides a functionality called get_close_matches. From the Python Documentation:

get_close_matches(word, possibilities[, n][, cutoff])

Return a list of the best "good
enough" matches. word is a sequence
for which close matches are desired
(typically a string), and
possibilities is a list of sequences against which to match
word (typically a list of strings).

Optional argument n (default
3) is the maximum number of close
matches to return; n must be
greater than 0.

Optional argument cutoff (default
0.6) is a float in the range [0,
1]. Possibilities that don't score
at least that similar to word are
ignored.

The best (no more than n) matches
among the possibilities are returned
in a list, sorted by similarity
score, most similar first.

  >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
  ['apple', 'ape']
  >>> import keyword
  >>> get_close_matches('wheel', keyword.kwlist)
  ['while']
  >>> get_close_matches('apple', keyword.kwlist)
  []
  >>> get_close_matches('accept', keyword.kwlist)
  ['except']

Could this library help you?

唔猫 2024-10-01 08:25:08

您可以使用 http://developer.yahoo.com/search/web/V1 /spellingSuggestion.html 这将提供类似的功能。

You can use http://developer.yahoo.com/search/web/V1/spellingSuggestion.html which would give a similar functionality.

海的爱人是光 2024-10-01 08:25:08

您可以查看提供此功能的 Xapian 源代码,就像许多其他搜索库一样。 http://xapian.org/

You can check out the source code for Xapian which provides this functionality, as do a lot of other search libraries. http://xapian.org/

流绪微梦 2024-10-01 08:25:08

我不确定它是否符合您的目的,但带有字典的字符串编辑距离算法可能足以满足小型应用程序的需求。

I am not sure if it serves your purpose but a String Edit distance Algorithm with a dictionary might suffice for a small Application.

救赎№ 2024-10-01 08:25:08

我想看一下这篇关于 Google 轰炸的文章。它表明它只是根据先前输入的结果建议答案。

I'd take a look at this article on google bombing. It shows that it just suggests answers based off previously entered results.

唐婉 2024-10-01 08:25:08

AFAIK“你是说吗?”功能不检查拼写。它只是根据谷歌解析的内容为您提供另一个查询。

AFAIK the "did you mean ?" feature doesn't check the spelling. It only gives you another query based on the content parsed by google.

゛清羽墨安 2024-10-01 08:25:08

有关此主题的精彩章节可以在公开的 信息检索简介

A great chapter to this topic can be found in the openly available Introduction to Information Retrieval.

骄傲 2024-10-01 08:25:08

您可以使用 ngram 进行比较: http://en.wikipedia.org/wiki/N- gram

使用 python ngram 模块: http://packages.python.org/ngram/ index.html

import ngram

G2 = ngram.NGram([  "iis7 configure ftp 7.5",
                    "ubunto configre 8.5",
                    "mac configure ftp"])

print "String", "\t", "Similarity"
for i in G2.search("iis7 configurftp 7.5", threshold=0.1):
    print i[0], "\t", i[1]

得到:

>>> 
String  Similarity
"iis7 configure ftp 7.5"    0.76
"mac configure ftp  0.24"
"ubunto configre 8.5"   0.19

U could use ngram for the comparisment: http://en.wikipedia.org/wiki/N-gram

Using python ngram module: http://packages.python.org/ngram/index.html

import ngram

G2 = ngram.NGram([  "iis7 configure ftp 7.5",
                    "ubunto configre 8.5",
                    "mac configure ftp"])

print "String", "\t", "Similarity"
for i in G2.search("iis7 configurftp 7.5", threshold=0.1):
    print i[0], "\t", i[1]

U get:

>>> 
String  Similarity
"iis7 configure ftp 7.5"    0.76
"mac configure ftp  0.24"
"ubunto configre 8.5"   0.19
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文