如何实现推荐引擎?
请耐心等待我的写作,因为我的英语不熟练。
作为一名程序员,我想了解在推荐系统或相关系统下实现的算法或机器学习智能。例如,最明显的例子来自亚马逊。他们有一个非常好的推荐系统。他们会知道:如果您喜欢这个,您可能也会喜欢那个,或者类似的东西:喜欢这个和<的人所占的比例是多少em>那在一起。
当然我知道亚马逊是一个大网站,他们在这些系统上投入了大量的人力和金钱。但是,在最基本的核心上,我们如何在数据库中实现类似的功能?我们如何识别一个对象与其他对象的关系?我们怎样才能建立一个统计单元来处理这种事情呢?
如果有人能指出一些算法,我将不胜感激。或者,基本上,指出一些我们都可以学习的好的直接参考资料/书籍。谢谢大家!
Please be patient with my writing, as my English is not proficient.
As a programmer, I wanna learn about the algorithm, or the machine learning intelligence, that are implemented underneath recommendation systems or related-based systems. For instance, the most obvious example would be from Amazon. They have a really good recommendation system. They get to know: if you like this, you might also like that, or something else like: What percentage of people like this and that together.
Of course I know Amazon is a big website and they invested a lot of brain and money into these systems. But, on the very basic core, how can we implement something like that within our database? How can we identify how one object relates to other? How can we build a statistic unit that handles this kind of thing?
I'd appreciate if someone can point out some algorithms. Or, basically, point out some good direct references/ books that we can all learn from. Thank you all!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
有两种不同类型的推荐引擎。
最简单的是基于项目,即“购买产品 A 的客户也购买了产品 B”。这很容易实现。存储稀疏对称矩阵 nxn(其中 n 是项目数)。每个元素 (m[a][b]) 是任何人购买商品“a”和商品“b”的次数。
另一个是基于用户的。那就是“像你这样的人常常喜欢这样的事情”。该问题的一个可能的解决方案是 k 均值聚类。即构建一组集群,将具有相似品味的用户放置在同一集群中,并根据同一集群中的用户提出建议。
一种更好但更复杂的解决方案是一种称为“受限玻尔兹曼机”的技术。 此处有对它们的介绍
The are 2 different types of recommendation engines.
The simplest is item-based ie "customers that bought product A also bought product B". This is easy to implement. Store a sparse symmetrical matrix nxn (where n is the number of items). Each element (m[a][b]) is the number of times anyone has bought item 'a' along with item 'b'.
The other is user-based. That is "people like you often like things like this". A possible solution to this problem is k-means clustering. ie construct a set of clusters where users of similar taste are placed in the same cluster and make suggestions based on users in the same cluster.
A better solution, but an even more complicated one is a technique called Restricted Boltzmann Machines. There's an introduction to them here
第一次尝试可能如下所示:
首先,我计算每对产品一起购买的频率,然后按产品对它们进行分组,并选择与其一起购买的前 20 个其他产品。结果应该放入某种由产品 ID 键入的字典中。
对于大型数据库来说,这可能会变得太慢或消耗太多内存。
A first attempt could look like this:
First I calculate how often each pair of products was bought together, and then I group them by the product and select the top 20 other products bought with it. The result should be put into some kind of dictionary keyed by product ID.
This might get too slow or cost too much memory for large databases.
我认为,您谈论的是知识库系统。我不记得编程语言(也许是 LISP),但有实现。另外,请查看OWL。
I think, you talk about knowledge base systems. I don't remember the programming language (maybe LISP), but there is implementations. Also, look at OWL.
如果您正在寻找开源解决方案或 SaaS 解决方案,例如 prediction.io,也可以使用 prediction.io /mag3llan.com" rel="nofollow">mag3llan.com。
There's also prediction.io if you're looking for an open source solution or SaaS solutions like mag3llan.com.