基于任意类别和相似性度量的匹配

发布于 2024-10-30 20:41:25 字数 573 浏览 14 评论 0原文

我有具有某些属性和客户类型的客户数据库。属性的集合可能会有所不同（尽管它们确实来自有限集），当我查看具有未知类型且具有给定属性的新客户时，我想确定他/她属于哪种类型。例如，假设我已经在数据库中拥有这些客户，

Customer | Type | Attributes

1           A      44,32,5,'X'
2           A      3,32,66,'A'
3           B      6,32,'A', 'B'           
4           C      47,31,2,'H'           
5           C      14,32,2,'O'  
6           C      2,'C'  
7           A      44

当我收到一个具有属性（例如3,32,2）的新客户时，我想确定该客户属于哪种类型，并且代码应该报告其置信度（以百分比表示）这场比赛。

这里最好使用什么方法？一些统计数据，或者基于某种亲和力矩阵的方法，或者基于推荐引擎风格的皮尔逊相关系数的方法？示例、伪代码将是最受欢迎的，但任何、所有的想法都可以。

谢谢，

原文

I have customer database who have certain attributes, and a customer type. The collection of attributes can vary (they do come from a finite set though), and when I look at a new customer with unknown type, with given attributes, I would like to determine which type s/he belongs to. For example, say I have these customers already in DB,

Customer | Type | Attributes

1           A      44,32,5,'X'
2           A      3,32,66,'A'
3           B      6,32,'A', 'B'           
4           C      47,31,2,'H'           
5           C      14,32,2,'O'  
6           C      2,'C'  
7           A      44

When I receive a new customer who has attributes, for example, 3,32,2, I would like to determine which type this customer belongs to, and the code should report its confidence (as percentage) of this match.

What is the best method to use here? Something statistical, or a method based on an affinity matrix of some kind, or recommendation engine style Pearson Correlation coefficients based approach? Sample, pseude code would be most welcome, but any, all ideas are fine.

Thanks,

分享到QQ

分享到微博