Mysql多变量线性回归

发布于 2024-08-18 16:45:46 字数 704 浏览 4 评论 0原文

我正在尝试对 mysql 5.0 数据库中的数据进行多变量(9 个变量)线性回归(结果值字段只有 2 个可能的值,1 和 0)。

我做了一些搜索,发现我可以使用:

mysql> SELECT
    -> @n := COUNT(score) AS N,
    -> @meanX := AVG(age) AS "X mean",
    -> @sumX := SUM(age) AS "X sum",
    -> @sumXX := SUM(age*age) "X sum of squares",
    -> @meanY := AVG(score) AS "Y mean",
    -> @sumY := SUM(score) AS "Y sum",
    -> @sumYY := SUM(score*score) "Y sum of square",
    -> @sumXY := SUM(age*score) AS "X*Y sum"

来获取许多基本回归变量,但我真的不想为 9 个变量的每个组合键入此操作。我能找到的有关如何对多变量进行回归的所有资源都需要矩阵运算。我可以使用mysql进行矩阵运算,还是有其他方法可以进行9变量线性回归?

我应该先将数据导出mysql吗?它大约有 80,000 行,所以移动它就可以了,只是不确定我还应该使用什么。

谢谢, 担

I am trying to do a multivarible (9 variables) linear regression on data in my mysql 5.0 database (the result value field only has 2 possible values, 1 and 0).

I've done some searching and found I can use:

mysql> SELECT
    -> @n := COUNT(score) AS N,
    -> @meanX := AVG(age) AS "X mean",
    -> @sumX := SUM(age) AS "X sum",
    -> @sumXX := SUM(age*age) "X sum of squares",
    -> @meanY := AVG(score) AS "Y mean",
    -> @sumY := SUM(score) AS "Y sum",
    -> @sumYY := SUM(score*score) "Y sum of square",
    -> @sumXY := SUM(age*score) AS "X*Y sum"

To get at many of the basic regression variables, but I really don't want to type out doing this for every combination of the 9 variables. All of the sources I can find about how to do regression on multi variables requires Matrix operations. Can I do Matrix operations with mysql, or are there other ways to do a 9 variable linear regression?

Should I export the data out of mysql first? Its ~80,000 rows, so it would be alright to move it, just not sure what else I should use.

Thanks,
Dan

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

傲世九天 2024-08-25 16:45:46

最好将这些数据存储在 MySQL 中,但您可以使用能够访问数据库的语言来处理数据。伪代码:

variables = [ 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I' ];

for X in $variables do
    for Y in $variables do
        query = 'SELECT
            @'+$X+$Y+' := COUNT(score) AS '+$X+$Y+',
            @mean'+$X+' := AVG(age) AS "X mean",
            @sum'+$X+' := SUM(age) AS "X sum",
            @sum'+$X+$X+' := SUM(age*age) "X sum of squares",
            @mean'+$Y+' := AVG(score) AS "Y mean",
            @sum'+$Y+' := SUM(score) AS "Y sum",
            @sum'+$Y+$Y+' := SUM(score*score) "Y sum of square",
            @sum'+$X+$Y+' := SUM(age*score) AS "X*Y sum"';
        db_execute(query);
    done
done

但为什么不将结果存储在表中?更适合数据库。

for X in $variables do
    for Y in $variables do
        query = 'INSERT INTO regression SELECT FROM measurements
            "'+$X+'" AS X
            "'+$Y+'" AS Y
            score AS valX
            age AS valY
            COUNT(score) AS N,
            AVG(age) AS meanX,
            SUM(age) AS sumX,
            SUM(age*age) squareX,
            AVG(score) AS meanY,
            SUM(score) AS sumY,
            SUM(score*score) squareY,
            SUM(age*score) AS sumXY';
        db_execute(query);
    done
done

在 X 列和 Y 列上放置单独的索引。

It is good to store this data in MySQL but you could process the data from a language that has access to the database. Pseudocode:

variables = [ 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I' ];

for X in $variables do
    for Y in $variables do
        query = 'SELECT
            @'+$X+$Y+' := COUNT(score) AS '+$X+$Y+',
            @mean'+$X+' := AVG(age) AS "X mean",
            @sum'+$X+' := SUM(age) AS "X sum",
            @sum'+$X+$X+' := SUM(age*age) "X sum of squares",
            @mean'+$Y+' := AVG(score) AS "Y mean",
            @sum'+$Y+' := SUM(score) AS "Y sum",
            @sum'+$Y+$Y+' := SUM(score*score) "Y sum of square",
            @sum'+$X+$Y+' := SUM(age*score) AS "X*Y sum"';
        db_execute(query);
    done
done

but why do not store the results in a table? More appropriate for a database.

for X in $variables do
    for Y in $variables do
        query = 'INSERT INTO regression SELECT FROM measurements
            "'+$X+'" AS X
            "'+$Y+'" AS Y
            score AS valX
            age AS valY
            COUNT(score) AS N,
            AVG(age) AS meanX,
            SUM(age) AS sumX,
            SUM(age*age) squareX,
            AVG(score) AS meanY,
            SUM(score) AS sumY,
            SUM(score*score) squareY,
            SUM(age*score) AS sumXY';
        db_execute(query);
    done
done

Put separate index on both X and the Y columns.

属性 2024-08-25 16:45:46

我建议将数据从 MySQL 移入 R。对于 1/0 响应数据,逻辑回归更合适,它不是您正在实现的简单平方和。

http://en.wikipedia.org/wiki/Logistic_regression

这似乎做得很好展示如何解决逻辑问题

http://www.omidrouhani.com /research/logisticregression/html/logisticregression.htm#_Toc147483467

I would reccomend moving the data out of MySQL and into R. With 1/0 response data a logistic regression is much more appropriate and it is not the simple sum of squares you are implementing.

http://en.wikipedia.org/wiki/Logistic_regression

This seems to do a good job of showing how to solve the logistic

http://www.omidrouhani.com/research/logisticregression/html/logisticregression.htm#_Toc147483467

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文