Solr 相关性 - 如何进行搜索质量 A/B 测试?
我希望进行实时 A/B 和受控并排实验,以帮助了解变化如何影响搜索质量。我将测试诸如提升值和模糊查询之类的变量。
还有哪些指标用于确定用户是否更喜欢 A 还是 B?以下是我在网上找到的 2 个指标...
在 Google Analytics 中,“搜索退出百分比”是一个可用于 衡量网站搜索结果的质量
衡量搜索质量的另一种方法是衡量搜索结果的数量 访问者查看的搜索结果页面的数量。
I am looking to perform live A/B and controlled side-by-side experiments to help understand how changes affect search quality. I will be testing variables such as boost value and fuzzyqueries.
What other metrics are used to determine whether users prefer A vs B? Here are 2 metrics I found online...
In Google Analytics, “% Search Exits” is a metric you can use to
measure the quality of your site-search resultsAnother way to measure search quality is to measure the number of
search result pages the visitor views.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
搜索质量是不容易衡量的。为了衡量相关性,您需要具备以下条件:
衡量相关性的竞争对手。对于您的情况,您的搜索引擎的不同实例将是彼此的竞争对手。我的意思是一个搜索引擎实例将运行基本算法,另一个启用模糊,另一个同时启用模糊和增强等等。
您需要手动对结果进行评分。您可以要求您的同事对热门查询的查询/网址对进行评级,然后对漏洞进行评级(即未评级的查询/网址对,您可以通过使用“学习排名”算法http://en.wikipedia.org/wiki/Learning_to_rank。不要对此感到惊讶,但事实确实如此(请阅读下面的 Google/Bing 示例)
和 Bing 是横向竞争对手这些搜索引擎在世界各地雇用人工法官,并在他们身上投入数百万美元,以对查询结果进行评级。因此,对于每个查询/URL 对,通常会对前 3 或前 5 个结果进行评级。根据这些评级,他们可能会使用 NDCG(标准化贴现累积收益)等指标,这是最好的指标之一,也是最受欢迎的指标之一。
根据维基百科:
维基百科对 NDCG 做了很好的解释。文章很短,请仔细阅读。
正如您所提到的,您还可以拥有点击率/数据,其中您拥有群体算法的智慧,并且您可以据此调整相关性。这是一个非常好的出路,但它会吸引垃圾邮件。因此它必须与一些指标(例如 NDCG/MAP 等)结合起来才能解决您的相关性问题。
如果您仍然需要更多地了解整个内容如何在您的案例研究中发挥作用,我可以提供更多详细信息。
Search Quality is something not easily measurable. For measuring relevance you need to have couple of things:
A competitor to measure relevance. For your case the different instance of your search engine will be the competitors for each other. I mean one search engine instance would have the basic algorithm running, the other with fuzzy enabled, another with both fuzzy and boosting and so on.
You need to manually rate the results. You can ask your colleagues to rate query/url pairs for popular queries and then for the holes(i.e. query/url pair not rated you can have some dynamic ranking function by using "Learning to Rank" Algorithm http://en.wikipedia.org/wiki/Learning_to_rank. Dont be surprised by that but thats true (please read below of an example of Google/Bing).
Google and Bing are competitors in the horizontal search market. These search engines employ manual judges around the world and invest millions on them, to rate their results for queries. So for each query/url pairs generally top 3 or top 5 results are rated. Based on these ratings they may use a metric like NDCG (Normalized Discounted Cumulative Gain) , which is one of finest metric and the one of most popular one.
According to wikipedia:
Wikipedia explains NDCG in a great manner. It is a short article, please go through that.
As you have mentioned you can also have click through rate/data where in you have kind of wisdom of crowd Algorithm and you tweak the relevance based on that. It is a very good way out but it attracts spamming. So it has to be coupled with some metric such as NDCG/MAP etc. to solve your relevance problem.
I can provide more details on this if you still need to know more on how whole stuff put together would work in your case study.