PHP 中的皮尔逊相关系数
我正在尝试在php中实现两组数据之间人员相关系数的计算。 我只是尝试执行可以在此网址找到的移植 python 脚本 http://answers.oreilly.com/ topic/1066-how-to-find-similar-users-with-python/
我的实现如下:
class LB_Similarity_PearsonCorrelation implements LB_Similarity_Interface{
public function similarity($user1, $user2){
$sharedItem = array();
$pref1 = array();
$pref2 = array();
$result1 = $user1->fetchAllPreferences();
$result2 = $user2->fetchAllPreferences();
foreach($result1 as $pref){
$pref1[$pref->item_id] = $pref->rate;
}
foreach($result2 as $pref){
$pref2[$pref->item_id] = $pref->rate;
}
foreach ($pref1 as $item => $preferenza){
if(key_exists($item,$pref2)){
$sharedItem[$item] = 1;
}
}
$n = count($sharedItem);
if ($n == 0) return 0;
$sum1 = 0;$sum2 = 0;$sumSq1 = 0;$sumSq2 = 0;$pSum = 0;
foreach ($sharedItem as $item_id => $pre) {
$sum1 += $pref1[$item_id];
$sum2 += $pref2[$item_id];
$sumSq1 += pow($pref1[$item_id],2);
$sumSq2 += pow($pref2[$item_id],2);
$pSum += $pref1[$item_id] * $pref2[$item_id];
}
$num = $pSum - (($sum1 * $sum2) / $n);
$den = sqrt(($sumSq1 - pow($sum1,2)/$n) * ($sumSq2 - pow($sum2,2)/$n));
if ($den == 0) return 0;
return $num/$den;
}
}
澄清以更好地理解代码,方法 fetchAllPreferences 返回一组实际上是项目的对象,将它们变成数组以便于管理
我不确定这个实现是否正确,特别是我对分母计算的正确性有些怀疑。
欢迎任何建议。
提前致谢!
I'm trying to implement the calculation of correlation coefficient of people between two sets of data in php.
I'm just trying to do the porting python script that can be found at this url
http://answers.oreilly.com/topic/1066-how-to-find-similar-users-with-python/
my implementation is the following:
class LB_Similarity_PearsonCorrelation implements LB_Similarity_Interface{
public function similarity($user1, $user2){
$sharedItem = array();
$pref1 = array();
$pref2 = array();
$result1 = $user1->fetchAllPreferences();
$result2 = $user2->fetchAllPreferences();
foreach($result1 as $pref){
$pref1[$pref->item_id] = $pref->rate;
}
foreach($result2 as $pref){
$pref2[$pref->item_id] = $pref->rate;
}
foreach ($pref1 as $item => $preferenza){
if(key_exists($item,$pref2)){
$sharedItem[$item] = 1;
}
}
$n = count($sharedItem);
if ($n == 0) return 0;
$sum1 = 0;$sum2 = 0;$sumSq1 = 0;$sumSq2 = 0;$pSum = 0;
foreach ($sharedItem as $item_id => $pre) {
$sum1 += $pref1[$item_id];
$sum2 += $pref2[$item_id];
$sumSq1 += pow($pref1[$item_id],2);
$sumSq2 += pow($pref2[$item_id],2);
$pSum += $pref1[$item_id] * $pref2[$item_id];
}
$num = $pSum - (($sum1 * $sum2) / $n);
$den = sqrt(($sumSq1 - pow($sum1,2)/$n) * ($sumSq2 - pow($sum2,2)/$n));
if ($den == 0) return 0;
return $num/$den;
}
}
clarification to better understand the code, the method fetchAllPreferences return back a set of objects that are actually the items, turns them into an array for ease of management
I'm not sure that this implementation is correct, in particular I have some doubts about the correctness of the calculation of the denominator.
any advice is welcome.
thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是我的解决方案:
http://profprog.ru/korrelyaciya-na -php-php-simple-pearson-correlation/
This is my solution:
http://profprog.ru/korrelyaciya-na-php-php-simple-pearson-correlation/
您的算法在数学上看起来是正确的,但在数值上不稳定。明确求平方和会导致灾难。如果您有像 array(10000000001, 10000000002, 10000000003) 这样的数字怎么办?用于计算方差的数值稳定的一次性算法可以在维基百科上找到,并且相同原理可以应用于计算协方差。
更简单的是,如果您不太关心速度,您可以只使用两次通过。在第一遍中求均值,然后在第二遍中使用教科书公式计算方差和协方差。
Your algorithm looks mathematically correct but numerically unstable. Finding the sum of squares explicitly is a recipe for disaster. What if you have numbers like
array(10000000001, 10000000002, 10000000003)
? A numerically stable one-pass algorithm for calculating the variance can be found on Wikipedia, and the same principle can be applied to computing the covariance.Easier yet, if you don't care much about speed, you could just use two passes. Find the means in the first pass, then compute the variances and covariances using the textbook formula in the second pass.
在这里尝试我的包
http://www.phpclasses.org/browse/package/5854。 html
try my package here
http://www.phpclasses.org/browse/package/5854.html