全文搜索最接近的匹配

发布于 2024-07-12 09:49:20 字数 226 浏览 11 评论 0原文

我正在尝试为我的网站实现内部搜索,以便在用户输入错误的情况下为用户指明正确的方向,类似于 Google 搜索中的“您的意思是吗”:。

有人知道如何进行这样的搜索吗? 我们如何确定我们假设用户想要搜索的单词或短语的相关性?

  • 我使用 asp.net 和 sql server 2005 以及 FTS (fullTextSearch)

谢谢

I am trying to implement an internal search for my website that can point users in the right direction in case the mistype a word, something like the did you mean : in google search.

Does anybody have an idea how such a search can be done? How can we establish the relevance of the word or the phrase we assume the user intended to search for?

  • i use asp.net and sql server 2005 with FTS (fullTextSearch)

Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

烙印 2024-07-19 09:49:21

我能想到的最简单的方法是编写一个返回两个单词之间不匹配程度的函数,然后循环遍历所有单词并找到最好的单词。

我已经用分支定界方法完成了此操作。 让我挖一下代码:

bool matchWithinBound(char* a, char* b, int bound){
  // skip over matching characters
  while(*a && *b && *a == *b){a++; b++;}
  if (*a==0 && *b==0) return true;
  // if bound too low, quit
  if (bound <= 0) return false;
  // try assuming a has an extra character
  if (*a && matchWithinBound(a+1, b, bound-1)) return true;
  // try assuming a had a letter deleted
  if (*b && matchWithinBound(a, b+1, bound-1)) return true;
  // try assuming a had a letter replaced
  if (*a && *b && matchWithinBound(a+1, b+1, bound-1)) return true;
  // try assuming a had two adjacent letters swapped
  if (a[0] && a[1]){
    char temp;
    int success;
    temp = a[0]; a[0] = a[1]; a[1] = temp;
    success = matchWithinBounds(a, b, bound-1);
    temp = a[0]; a[0] = a[1]; a[1] = temp;
    if (success) return true;
  }
  // can try other modifications
  return false;
}

int DistanceBetweenWords(char* a, char* b){
  int bound = 0;
  for (bound = 0; bound < 10; bound++){
    if (matchWithinBounds(a, b, bound)) return bound;
  }
  return 1000;
}

The simplest approach I can think of is to write a function that returns the degree of mismatch between two words, and you loop through all the words and find the best ones.

I've done this with a branch-and-bound method. Let me dig up the code:

bool matchWithinBound(char* a, char* b, int bound){
  // skip over matching characters
  while(*a && *b && *a == *b){a++; b++;}
  if (*a==0 && *b==0) return true;
  // if bound too low, quit
  if (bound <= 0) return false;
  // try assuming a has an extra character
  if (*a && matchWithinBound(a+1, b, bound-1)) return true;
  // try assuming a had a letter deleted
  if (*b && matchWithinBound(a, b+1, bound-1)) return true;
  // try assuming a had a letter replaced
  if (*a && *b && matchWithinBound(a+1, b+1, bound-1)) return true;
  // try assuming a had two adjacent letters swapped
  if (a[0] && a[1]){
    char temp;
    int success;
    temp = a[0]; a[0] = a[1]; a[1] = temp;
    success = matchWithinBounds(a, b, bound-1);
    temp = a[0]; a[0] = a[1]; a[1] = temp;
    if (success) return true;
  }
  // can try other modifications
  return false;
}

int DistanceBetweenWords(char* a, char* b){
  int bound = 0;
  for (bound = 0; bound < 10; bound++){
    if (matchWithinBounds(a, b, bound)) return bound;
  }
  return 1000;
}
习ぎ惯性依靠 2024-07-19 09:49:21

你为什么不使用Google Power?,你可以使用他们的建议服务

这里是一个c#示例

why don't you use google power?, you can consume their suggest service

here is an example on c#

卷耳 2024-07-19 09:49:20

您可以使用算法来确定字符串相似性,然后建议搜索索引中的其他字符串,直到一定的差异。

其中一种算法是Levenshtein 距离

但是,不要忘记寻找现有的解决方案。 我认为例如 Lucene 具有搜索相似字符串的能力。

顺便说一句,这里有一篇关于此主题的相关文章:Google 如何“你的意思?” 算法有效吗?

You could use an algorithm for determining string similarity and then suggest other string from your search index up to a certain difference.

One of these algorithms is the Levenshtein distance.

However, don't forget searching for existing solutions. I think e.g. Lucene has the capability to search for similar strings.

Btw, here's a related post on this topic: How does the Google “Did you mean?” Algorithm work?

眼眸里的那抹悲凉 2024-07-19 09:49:20

这是通过正则表达式查询与短语匹配的最接近的关键字来完成的。

这里是一篇很棒的文章,可能会对您有所帮助。

This is done querying through regular expression the closest keywords that match the phrase.

Here is a great article that might help you.

会发光的星星闪亮亮i 2024-07-19 09:49:20

对于 T-SQL,您可以使用 SOUNDEX 功能对单词进行语音比较。

如果您获取用户输入,然后通过 soundex 代码将其与数据库中的其他单词进行比较,您应该能够得出“您的意思是”的列表吗? 字。

例如,

select SOUNDEX('andrew')
select SOUNDEX('androo')

两者都会产生相同的输出(A536)。

现在有更好的算法,但 soundex 内置于 sql server 中。

With T-SQL You can use the SOUNDEX function to compare words phonetically.

If you take the users input and then compare it with other words in your database by soundex code, you should be able to come up with a list of 'do you mean'? words.

E.g.

select SOUNDEX('andrew')
select SOUNDEX('androo')

will both produce the same output (A536).

There are better algorithms these days, but soundex is built into sql server.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文