进行“相关搜索”的方法 功能性
我见过一些网站在您执行搜索时列出相关搜索,即它们建议您可能感兴趣的其他搜索查询。
我想知道在中型网站中对此进行建模的最佳方法(没有足够的流量来依靠访客统计数据来推断关系)。 我最初的想法是存储每个唯一查询的前 10 个结果,然后在执行新搜索时查找与前 10 个结果中的一定数量匹配但理想情况下不匹配所有结果的所有历史搜索(匹配所有结果可能建议进行等效搜索,因此作为建议没有那么有用)。
我想有些人以前已经完成了这个功能,并且可能能够提供一些不同方法的想法来实现这一点。 我不一定要寻找一种成功的想法,因为解决方案无疑会根据网站的大小和性质而有很大差异。
I've seen a few sites that list related searches when you perform a search, namely they suggest other search queries you may be interested in.
I'm wondering the best way to model this in a medium-sized site (not enough traffic to rely on visitor stats to infer relationships). My initial thought is to store the top 10 results for each unique query, then when a new search is performed to find all the historical searches that match some amount of the top 10 results but ideally not matching all of them (matching all of them might suggest an equivalent search and hence not that useful as a suggestion).
I imagine that some people have done this functionality before and may be able to provide some ideas of different ways to do this. I'm not necessarily looking for one winning idea since the solution will no doubt vary substantially depending on the size and nature of the site.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您是否考虑过一个轴上包含关键字与另一个轴上包含文档的矩阵。 一旦找到代表关键字的向量集,找到在初始结果集中找到的关键字集,然后找到一种方法,根据其他关键字引用的文档数量或与初始结果集的交叉次数来对其他关键字进行排名。
have you considered a matrix of with keywords on 1 axis vs. documents on another axis. once you find the set of vetors representing the keywords, find sets of keyword(s) found in your initial result set and then find a way to rank the other keywords by how many documents they reference or how many times they interset the intial result set.
我为此尝试了多种不同的方法,并取得了不同程度的成功。 最后,我认为最好的方法高度依赖于正在搜索的域/主题,以及用户如何形成查询。
您关于存储以前的搜索的想法对我来说似乎是合理的。 我很想知道它在实践中是如何工作的(我的意思是,以最真诚的方式——有许多细微差别可能导致这些技术在“现实世界”中失败,特别是当数据稀疏时)。
以下是我过去使用过并在文献中看到的一些技术:
I've tried a number of different approaches to this, with various degrees of success. In the end, I think the best approach is highly dependent on the domain/topics being searched, and how the users form queries.
Your thought about storing previous searches seems reasonable to me. I'd be curious to see how it works in practice (I mean that in the most sincere way -- there are many nuances that can cause these techniques to fail in the "real world", particularly when data is sparse).
Here are some techniques I've used in the past, and seen in the literature: