在pyspark在此文档中给出的ALS示例中 - )所使用的数据在一列中具有明确的反馈。数据就是这样:
|用户|项目|评分|
| --- | --- | --- |
|第一| A | 2 |
|第二| b | 3 |
但是,就我而言,我在这样的多列中有隐式反馈:
|用户|项目|点击|视图|购买|
| --- | --- | --- | --- | --- |
|第一| A | 20 | 35 | 3 |
|第二| b | 3 | 12 | 0 |
我知道我们可以通过设置 indimitprefs
为 false
来使用隐式反馈。但是,它仅接受一个列。如何使用多个列?
我发现了这个问题:如何管理多个积极的隐式反馈? ,它与火花和交替的最小平方方法无关。我是否必须根据答案手动分配加权方案?还是Pyspark中有更好的解决方案?
In the ALS example given in PySpark as per this documentation - http://spark.apache.org/docs/latest/ml-collaborative-filtering.html) the data used has explicit feedback in one column. The data is like this:
| User | Item | Rating |
| --- | --- | --- |
| First | A | 2 |
| Second | B | 3|
However, in my case I have implicit feedbacks in multiple columns like this:
| User | Item | Clicks | Views | Purchase |
| --- | --- | --- | --- | --- |
| First | A | 20 | 35 | 3 |
| Second | B | 3| 12 | 0 |
I know we can use implicit feedback by setting implicitPrefs
as False
. However, it only accepts a single column. How to use multiple columns?
I found this question: How to manage multiple positive implicit feedbacks? However, it is not related with Spark and Alternating Least Square method. Do I have to manually assign a weighting scheme as per that answer? or is there a better solution in PySpark?
发布评论
评论(1)
我已经彻底研究了您的问题,我还没有发现通过ALS中的多个列,大多数此类问题正在通过手动称重和创建评级列来解决。
以下是我的解决方案
提取最小值(除外),并以同一COLMN为例,以下
示例:购买Col的最低值为3
3/3、10/3、20/3 ..等等,
的评分是评级的公式
以下是评级
I have thoroughly Researched your issue, i haven't found passing multiple columns in ALS, most of the such problems are being solved by manually weighing and creating Rating column.
Below is my solution
Extract Smallest value (except 0) and devide all ements for same colmn by it
example : min value for Purchase col is 3
so 3/3, 10/3, 20/3 .. etc
Below is the formula for Rating
Rating = 60% of Purchase + 30% of Clicks + 10% of Views