机器学习和统计学有什么区别?
在 2010 年图灵讲座 Christopher Bishop 谈论机器学习正在经历一场革命,因为统计学正在应用于机器学习算法......
但它就像所有机器学习算法一样都是统计算法..两者之间的真正区别是什么?为什么大多数大学的课程都是分开的?
in the Turing lecture 2010 Christopher Bishop talks about machine learning undergoing a revolution because statistics is being applied to machine learning algorithms...
but then its like all machine learning algorithms are all statistical algorithms.. whats the real difference between the two? why are they separate courses in most universities?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
统计学一切都基于概率模型。典型的分析首先假设您的数据是来自具有某种分布的随机变量的样本,然后对分布的参数进行推断。
机器学习可能使用概率模型,当它使用时,它与统计数据重叠。但机器学习并不那么致力于概率。它还愿意使用其他不基于概率的方法来解决问题。
Statistics bases everything on probability models. A typical analysis starts by assuming your data are samples from a random variable with some distribution, then making inferences about the parameters of the distribution.
Machine learning may use probability models, and when it does, it overlaps with statistics. But machine learning isn't so committed to probability. It is willing to also use other approaches to problem solving that are not based on probability.
两者之间没有太大差异,主要是文化差异。机器学习源于计算机科学,而统计学则更加数学化。有一篇不错的博客文章,名为“统计与机器学习,战斗!” Brendan O'Connor 谈到了这一点。
至于机器学习的非统计方法,有几种基于规则的方法(决策树、规则归纳、ILP),还有用于控制问题的强化学习等方法。这些对我来说感觉不太统计,但你可以声称它们是......如果你愿意的话,你可能可以声称所有的生命都属于统计决策理论(事实上,马库斯·哈特确实如此)。
There isn't a great deal of difference between the two, and what there is is mostly cultural. Machine Learning came from Computer Science roots whereas Statistics is more mathematical. There's a nice blog post called "Statistics vs. Machine Learning, fight!" by Brendan O'Connor that talks about this.
As for non-statistical approaches to machine learning, well there are several rule-based approaches (decision trees, rule induction, ILP) and there are also approaches like reinforcement learning for control problems. Those don't feel very statistical to me, but you could claim that they are... you could probably claim all of life falls under statistical decision theory if you wanted to (in fact, Marcus Hutter does).
我可以看到一些重要的区别:
#Scope:机器学习使用统计模型,但它也使用其他模型,例如动态规划、强化学习、来自人工智能或优化的技术。
#观点:统计学通常关注估计量的属性(无偏性、渐近行为),而机器学习主要关注现实世界问题的解决。
#研究领域:统计学可以被视为应用数学的子领域,而机器学习可以被视为计算机科学的子领域。
#代码开发和应用:从事统计工作的人通常偏爱 R(或 SAS、STATA、EVIEWS),而从事机器学习的人通常选择 Python(或其他结构化编程语言)
I can see some important differences:
#Scope: Machine learning uses statistical models, but it also uses other models such as dynamic programming, reinforcement learning, techniques that came from Artificial Intelligence or optimization.
#Point of View: Statistics is usually concerned with the properties of the estimators (unbiasedness, assymptotic behavior) and machine learning is mainly concerned with the solution of real world problems.
#Reasearch field: While Statistics can be seen as a subfield of Applied Mathematics, Machine Learning can be seen as a subfield of computer science.
#Code development and application: While people who work with statistics usually has a prefference for R (or SAS, STATA, EVIEWS), people who work with machine learning usually chooses Python (or another structured programming language)
也许值得指出的是,类似的问题正在 CrossValidated 中得到解决和讨论
Maybe it's worth to point out that similar question is being addressed and discussed at CrossValidated
统计学侧重于数据分析的各个方面,例如描述性、探索性、推理性、预测性和因果性。但是,机器学习只关注预测建模。
Statistics focuses on all aspect of data-analysis such as descriptive, exploratory, inferential, predictive and causal. But, machine learning only focus on predictive modeling.
机器学习是
一种可以在不依赖基于规则的编程的情况下从数据中学习的算法。
计算机科学和人工智能的一个子领域,涉及构建可以从数据而不是显式编程指令中学习的系统。
统计建模是
以数学方程形式形式化变量之间的关系。
数学的子领域,涉及寻找变量之间的关系以预测结果
如果机器学习系统没有被编程为执行一项任务,但被编程为学习执行该任务。这是一项数据驱动的练习。现代机器学习并不依赖于丰富的算法技术。这种形式的机器学习几乎所有应用都基于深度神经网络。这是我们现在倾向于称为深度学习的领域,是机器学习的专业化,经常应用于弱人工智能应用程序,其中机器执行人类任务。
Machine Learning is
An algorithm that can learn from data without relying on rules-based programming.
A subfield of computer science and artificial intelligence which deals with building systems that can learn from data, instead of explicitly programmed instructions.
Statistical Modelling is
Formalization of relationships between variables in the form of mathematical equations.
Subfield of mathematics which deals with finding relationship between variables to predict an outcome
A machine learning system is truly a learning system if it is not programmed to perform a task, but is programmed to learn to perform the task. It is a data-driven exercise. Modern machine learning does not rely on a rich set of algorithmic techniques. Almost all applications of this form of machine learning are based on deep neural networks. This is the area we now tend to call Deep Learning, a specialization of Machine Learning, and frequently applied in weak Artificial Intelligence applications, where machines perform a human task.
在机器学习中,我们的想法是针对有数据和没有数据的情况构建一个单独的模型。
另一方面,统计是关于保留您拥有的数据并获得数据的最佳结果。
区别在于哲学会影响你对待异常值的方式。
在机器学习中,你出去寻找足够的异常值,这些异常值成为你可以实际训练的东西。
对于统计学,你会说:“我已经得到了我能收集到的所有数据。”所以,你扔掉异常值。由于使用机器学习和统计的场景,这是一个哲学上的差异。
统计数据通常用于有限的数据体系,或者机器学习需要处理大量数据。
In ML, the idea is that you build a separate model for the situation, where you have the data versus you don't have the data.
Statistics, on the other hand, is about keeping the data that you have and getting the best result of the data.
The difference is philosophy affects how you treat outliers.
In ML, you go out and find enough outliers that become something that you can actually train with.
With Statistics you say, "I've got all the data I'll ever be able to collect." So, you throw out outliers. It's a Philosophical difference because of the scenarios where ML and statistic are used.
Statistics is often used in a limited data regime or ML operates with lots of data.
机器学习:
机器学习是一门通过在没有明确编程的情况下提供数据和信息来使计算机像人类一样学习和行动的科学。
例子:
当我们来到计算机面前,写一段代码或者程序,告诉计算机一步步去做。但机器学习我们不这样做,系统会自己学习。我们只是提供过去的数据(称为标记数据),系统在这个过程中学习所谓的训练过程,我们告诉系统结果是对还是错,系统接受反馈并自我纠正,这就是它向谁学习,它给出了大多数情况下的正确输出。显然它不是 100% 正确,但目标是尽可能准确。
统计学:
它是一个数学领域,用于发现不同变量之间的关系。
主要区别:
统计学:侧重于以数学方程形式形式化变量之间的关系。
机器学习:由可以从数据中学习而不依赖基于规则的编程的算法组成。
Machine learning :
Machine learning is the science of making computers learn and act like humans by feeding data and information without being explicitly programmed.
Example:
When we coming to the computer, Writing a peace of code or program and telling the computer step by step to do. But ML we don't do that, the system learns on its own. We just provide the past data(called labelled data) and the system learns during the process what is known as training process, we tell the system the system the outcome are right or wrong, that feedback is taken by system and it corrects itself and that's who its learns, it gives the correct output of the most of the cases. Obviously it is not 100% correct but aim is to get as accurate as possible.
Statistics:
It is a field of mathematics which is used to find the relationship between different variables.
Main difference:
Statistics: Focus on formalisation of relationship between variables in the form of mathematical equations.
Machine learning: Comprises of algorithms that can learn from data without relying on rules based programming.
机器学习是由计算机科学家开发的,而统计学是由数学家开发的。
机器学习建立在统计框架之上。
统计学发展于17世纪,机器学习发展于1959年。
机器学习是人工智能的一个子领域。统计学是数学的一个子领域。
机器学习发现可概括的预测模式,而统计数据则从样本中得出总体推断。
机器学习是一种黑盒方法。统计数据打开黑匣子。
机器学习需要大量的数据和属性,而统计则需要较少的数据和属性。
统计学需要数学知识。机器学习需要数学和算法知识。
统计数据使用数据点之间的相关性,而机器学习则用于做出假设。
机器学习的假设比统计学少。
机器学习具有更强的预测能力。
机器学习比统计需要更少的人力。
机器学习使用算法。统计学使用方程。
他们使用不同的工具,
您可以在我发现的这篇文章中找到更多信息: https://www.thejay.tech/2020/01/the-actual-difference- Between.html
Machine learning is developed by computer scientists while Statistics is developed by mathematicians.
Machine learning is built upon statistical frameworks.
Statistics was developed in the 17th century, MAchine learning was developed in 1959.
Machine learning is a subfield of Artificial Intelligence. Statistics is a subfield of Mathematics.
Machine learning finds the generalizable predictive patterns while statistics draw population inference from a sample.
Machine learning is a BlackBox approach. Statistics opens the BlackBox.
Machine learning needs a very large amount of data and attributes while Statistics need less.
Statistics require mathematical knowledge. Machine learning requires both mathematical and algorithms knowledge.
Statistics use the correlation between the data points while machine learning is used for making a hypothesis.
ML makes fewer assumptions than statistics.
Machine learning has more predictive power.
Machine learning requires less human effort than statistics.
Machine learning uses algorithms. Statistics uses equations.
They use different tools
YOu can find more in this article I found: https://www.thejay.tech/2020/01/the-actual-difference-between.html