搜索关键词排名

发布于 2024-12-04 11:56:22 字数 277 浏览 10 评论 0原文

问题是：如何根据搜索时间和次数对我的 Web 应用程序中的搜索查询中使用的关键字进行排名？

用户在文本框中键入他的搜索查询。通过 AJAX 我需要向用户返回一些建议。这些建议基于该关键字的搜索次数，并且应按最近搜索排序。

例如，如果用户输入搜索词“hang”，则建议应按以下顺序排列：“hangover part 2”、“hangover”。

我应该如何设计数据库来存储搜索查询？我应该如何编写sql查询来获取建议？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

她如夕阳 2024-12-11 11:56:22

对于查询建议，一个好的方法是计算每个搜索查询的出现次数（最好不要计算同一用户的重复查询）。你将有一个像这样的文件/表/东西（查询，计数）：

"britney spears" 12
"kelly clarkson" 5
"billy joel" 27
"query abcdef" 2
"lady gaga" 39
...

然后你可以按出现的降序排序：

"lady gaga" 39
"billy joel" 27    
"britney spears" 12
"lady xyz" 5
"query abcdef" 2
...

然后当有人搜索“lady”时，例如，做一个前缀搜索从文件/表/某物顶部到底部的所有字符串。如果您只需要 K 个建议，您只需找到 Top-K 个建议即可。

您可以使用一个简单的文件来实现这一点，或者您也可以有一个计数查询表并执行类似于以下的查询：

SELECT q.query from (SELECT * from search_queries order by query_count DESC) as q where q.query LIKE "prefix%" LIMIT 0,K

两个注意事项：

有更好（也更困难）的方法来执行此操作。例如，亚马逊有一个非常好的查询建议。
提供的解决方案只会建议以用户查询开头的查询。喜欢：
“女士”=> [“lady gaga”，“lady xyz”]

查询“lady”不会匹配“gaga lady”。为了使它们匹配，您需要通过数据库的全文搜索支持或外部库（例如 Lucene。

For query suggestion a good way is to count the number of occurrences of each search query (it is probably better to not count repeated queries made by the same user). You'll have a file/table/something (query, count) like this:

"britney spears" 12
"kelly clarkson" 5
"billy joel" 27
"query abcdef" 2
"lady gaga" 39
...

Then you can sort by descending order of occurrence:

"lady gaga" 39
"billy joel" 27    
"britney spears" 12
"lady xyz" 5
"query abcdef" 2
...

Then when someone is searching "lady", for example, do a prefix search on all strings from the top of the file/table/something to the bottom. If you only want K suggestions you'll go only until you find the Top-K suggestions.

You could implement this using a simple file, or you can also have a counting query table and do a query similar to:

SELECT q.query from (SELECT * from search_queries order by query_count DESC) as q where q.query LIKE "prefix%" LIMIT 0,K

Two notes:

There are better (and more difficult) ways of doing this. Amazon, for example, has a pretty nice query suggestion.
The provided solution will only suggest queries that starts with the user query. Like:
"lady" => ["lady gaga", "lady xyz"]

Query "lady" won't match "gaga lady". For them to match you will need query indexing, through the Full-Text Search support of your database or an external library such as Lucene.

回复收藏 0 原文

予囚 2024-12-11 11:56:22

理想情况下，您应该按如下所示进行排序：

order by sum(# of searches / (how long ago that search was performed + 1))

必须对此进行修改，以便多久前将基于适当的基准时间。例如，如果您希望搜索在一周后计为一半，则可以将一周设置为 1。

这显然效率很低，因为计算所有搜索结果的每次搜索执行的时间将非常耗时。因此，您可能希望为每次搜索保留一个运行总计，并在每个时间段将总计乘以某个值。例如，如果您希望一周后搜索量计为一半，则可以为每次搜索在该列中添加一个。然后，您将有一个每周将搜索列乘以 0.5 的流程。然后你只需对该列进行排序即可。

Ideally, you'd sort on something like the following:

order by sum(# of searches / (how long ago that search was performed + 1))

This would have to be modified so that how long ago would be base on an appropriate base time. For example, if you want searches to count as half after a week, you'd make a week = 1.

This will clearly be inefficient, because calculating how long ago each search was performed for all search results will be time consuming. Thus, you might want to keep a running total for each search and multiply the totals by a certain value each time period. For example, if you want searches to count as half after a week, you would add one to that column for every search. Then, you would have a process that multiplies the search column by .5 every week. Then you just sort on that column.

回复收藏 0 原文