stackoverflow 建议如何运作?
这些算法背后的理论是什么,例如,在您编写算法时,在 stackoverflow 网站上针对类似问题生成建议?你能推荐一些关于这个主题的书吗?
What is the theory behind the algorithms, that for example, generate the suggestions on stackoverflow site for similar questions while you write one? Could you recommend some books on the subject?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您谈论的算法主要存在于 3 个 AI 分支中: NLP, 机器学习和IR。
例如,要查找新问题最相似的 10 个问题,可以提取 n-grams根据每个问题的文本,计算每个问题的 TF-IDF 权重向量问题n-gram,然后计算新问题与所有其他问题之间的余弦相似度,并选择相似度最高的 10 个问题。
您可以阅读一些免费书籍:
http://nlp.stanford.edu/IR-book/
http://infolab.stanford.edu/~ullman/mmds.html
和一月下旬开始的 2 门免费课程:
http://www.nlp-class.org/
http://jan2012.ml-class.org/
另外(有点涉及):
http://see.stanford.edu/see/courseinfo.aspx?coll=63480b48-8819-4efd-8412-263f1a472f5a
http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1
The algorithms you talk about are found mainly in 3 AI branches: NLP, ML and IR.
For example to find the most similar 10 questions of a new question one could extract n-grams from the texts of each question, compute TF-IDF weight vectors for each question's n-grams, then compute the cosine similarity between the new question and all the other questions, and choose the 10 questions with the highest similarities.
Some free books you can read:
http://nlp.stanford.edu/IR-book/
http://infolab.stanford.edu/~ullman/mmds.html
And a 2 free courses starting late January:
http://www.nlp-class.org/
http://jan2012.ml-class.org/
Also (kind of involved):
http://see.stanford.edu/see/courseinfo.aspx?coll=63480b48-8819-4efd-8412-263f1a472f5a
http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1
我认为这与源于购物篮分析的关联规则挖掘有关。作为一个很好的参考,Bing Liu 的Web Data Mining 绝对是最好的参考之一。
I think it has to do with Association Rule Mining that originated from market basket analysis. For a good reference, Web Data Mining by Bing Liu is definitely one of the best.