Mysql多变量线性回归
我正在尝试对 mysql 5.0 数据库中的数据进行多变量(9 个变量)线性回归(结果值字段只有 2 个可能的值,1 和 0)。
我做了一些搜索,发现我可以使用:
mysql> SELECT
-> @n := COUNT(score) AS N,
-> @meanX := AVG(age) AS "X mean",
-> @sumX := SUM(age) AS "X sum",
-> @sumXX := SUM(age*age) "X sum of squares",
-> @meanY := AVG(score) AS "Y mean",
-> @sumY := SUM(score) AS "Y sum",
-> @sumYY := SUM(score*score) "Y sum of square",
-> @sumXY := SUM(age*score) AS "X*Y sum"
来获取许多基本回归变量,但我真的不想为 9 个变量的每个组合键入此操作。我能找到的有关如何对多变量进行回归的所有资源都需要矩阵运算。我可以使用mysql进行矩阵运算,还是有其他方法可以进行9变量线性回归?
我应该先将数据导出mysql吗?它大约有 80,000 行,所以移动它就可以了,只是不确定我还应该使用什么。
谢谢, 担
I am trying to do a multivarible (9 variables) linear regression on data in my mysql 5.0 database (the result value field only has 2 possible values, 1 and 0).
I've done some searching and found I can use:
mysql> SELECT
-> @n := COUNT(score) AS N,
-> @meanX := AVG(age) AS "X mean",
-> @sumX := SUM(age) AS "X sum",
-> @sumXX := SUM(age*age) "X sum of squares",
-> @meanY := AVG(score) AS "Y mean",
-> @sumY := SUM(score) AS "Y sum",
-> @sumYY := SUM(score*score) "Y sum of square",
-> @sumXY := SUM(age*score) AS "X*Y sum"
To get at many of the basic regression variables, but I really don't want to type out doing this for every combination of the 9 variables. All of the sources I can find about how to do regression on multi variables requires Matrix operations. Can I do Matrix operations with mysql, or are there other ways to do a 9 variable linear regression?
Should I export the data out of mysql first? Its ~80,000 rows, so it would be alright to move it, just not sure what else I should use.
Thanks,
Dan
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
最好将这些数据存储在 MySQL 中,但您可以使用能够访问数据库的语言来处理数据。伪代码:
但为什么不将结果存储在表中?更适合数据库。
在 X 列和 Y 列上放置单独的索引。
It is good to store this data in MySQL but you could process the data from a language that has access to the database. Pseudocode:
but why do not store the results in a table? More appropriate for a database.
Put separate index on both X and the Y columns.
我建议将数据从 MySQL 移入 R。对于 1/0 响应数据,逻辑回归更合适,它不是您正在实现的简单平方和。
http://en.wikipedia.org/wiki/Logistic_regression
这似乎做得很好展示如何解决逻辑问题
http://www.omidrouhani.com /research/logisticregression/html/logisticregression.htm#_Toc147483467
I would reccomend moving the data out of MySQL and into R. With 1/0 response data a logistic regression is much more appropriate and it is not the simple sum of squares you are implementing.
http://en.wikipedia.org/wiki/Logistic_regression
This seems to do a good job of showing how to solve the logistic
http://www.omidrouhani.com/research/logisticregression/html/logisticregression.htm#_Toc147483467