如何自动填充相关问题
我想在我的应用程序中获得相关的[事物/问题],类似于当您从“标题”字段中跳出时 StackOverflow 所做的事情。
我只能想到一种方法来做到这一点,我认为这种方法可能足够快
- 在所有[事物]的标题语料库中搜索标题,并返回前x个匹配项。我们可以使用用于站点搜索的任何搜索。
还有哪些其他方法可以做到这一点,这些方法足够快,因为这将在禁忌时发送,因此大型服务器端处理对此不可行。
我只是在寻找执行此操作的方法,但我正在使用 mysql 和 DJango,所以如果您的答案使用它,那就更好了。
[我想不出好的标签,所以请随意编辑]
I want to get a related [things/questions] in my app, similar to what StackOverflow does, when you tab out of the Title field.
I can think of only one way to do it, which i think might be fast enough
- Do a search for the title in corpus of titles of all [things], and return first x matches. We can use whatever search is being used for site search.
What are other ways o do this, which are fast enough, as this is going to be sent on tabout, so a large server side processing is not feasible for it.
I am just looking for the way to do this, but I am using mysql and DJango, so if your answer uses that, all the better.
[I cannot think of good tags for it, so please feel free to edit them]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您正在研究基于内容的推荐算法。 AFAICT StackOverflow 会查看标签和标题中的单词,并找到与其中一些相同的问题。它可以在文档表示为 TF 的空间中实现为最近邻搜索-IDF向量。
在实现方面,可以使用任何支持词干、停用词、非严格匹配和 tf-idf 权重的 Django 搜索引擎。算法复杂度不高(只是几个索引查找),所以用Python写也没关系。
如果您没有找到符合您要求的搜索引擎,请将词干提取和停用词留给搜索引擎,对单个单词调用搜索引擎,并使用有利于相似标签的分数进行您自己的 tf-idf 评分。
You're looking at a content-based recommendation algorithm. AFAICT StackOverflow's looks at the tags and the words in the title, and finds questions that share some of these. It can be implemented as a nearest neighbour search in a space where documents are represented as TF-IDF vectors.
Implementation-wise, go with any Django search engine that supports stemming, stopwords, non-strict matches, and tf-idf weights. Algorithmic complexity isn't high (just a few index lookups), so it doesn't matter if it's written in Python.
If you don't find a search engine doing what you want, leave the stemming and stopwords to the search engine, call the search engine on individual words, and do your own tf-idf scoring with a score that favors similar tags.