统计、机器学习和数据挖掘
我目前正在学习数据挖掘,有以下问题。
- 机器学习和数据挖掘之间有什么关系?
- 我发现许多数据挖掘技术都与统计相关,而我“听说”数据挖掘与机器学习有很多关系。所以我的问题是:机器学习与统计学密切相关吗?
- 如果它们不密切相关,是否存在将侧重于统计技术的数据挖掘和侧重于机器学习技能的数据挖掘分开的划分?因为我发现有些研究生院统计系开设了数据挖掘课程。
I am currently learning data mining and I have the following questions.
- what is the relationship between machine learning and data mining?
- I found many data mining techniques are associated with statistics, while I "hear" data mining has many thing to do with machine learning. So my question is: is machine learning closely related with statistics?
- If they are not closely related, is there such divisions that separate data mining focusing on statistical techniques and data mining focusing on machine learning skills? Because I found department of statistics of some graduate schools open data mining courses.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
数据挖掘是从数据中提取有用信息的过程,例如模式、趋势、客户/用户行为、喜欢/不喜欢等。这涉及使用与人工智能和统计相关的算法。
Wikipedia 对数据挖掘的定义是:
机器学习涉及让计算机“学习”行为、趋势等,并据此采取行动。例如,在信用卡欺诈中,计算机“学习”客户的行为,如果发生奇怪的情况(涉及非常高金额的交易等),它会将该交易标记为潜在的欺诈。
维基百科对机器学习的定义是:
机器学习使用数据挖掘来学习模式、行为、趋势等,因为数据挖掘是从一组数据中提取这些信息的方法。数据挖掘和机器学习都使用统计数据来做出决策。所以,是的,统计数据在数据挖掘和机器学习中涉及并且非常重要。
Data mining is the process of extracting useful information from data, such as patterns, trends, customer/user behavior, liking/disliking etc. This involves the use of algorithms that are related to Artificial Intelligence and statistics.
Wikipedia's definition of Data Mining is:
Machine Learning involves making the computers "learn" that behavior, trend etc, and to act according. For example, in credit card fraud, the computer "learns" the behavior of a customer, and if something strange occurs (a transaction involving very high amounts etc), it flags that transaction for potential fraud.
Wikipedia's definition of machine learning is:
Machine learning uses Data Mining to learn the pattern, behavior, trend etc, because Data Mining is the way of extracting this information from a set of data. Data Mining and Machine Learning both use Statistics make decisions. So yes statistics is involved and is very important in Data Mining and Machine learning.
不同的人所说的机器学习、数据挖掘和统计之间往往有很多重叠。这些术语的定义取决于您询问的对象。
这是很好的概述,其中有很多很棒的内容链接。
There tends to be a lot of overlap between what different people call machine learning, data mining and statistics. The very definitions of the terms would depend on whom you ask.
Here is a nice overview, with lots of great links.
虽然数据挖掘和机器学习之间有重叠,但我们可以区分它们;简单来说,比如:
数据挖掘搜索模式来预测和/或描述海量数据,
机器学习进一步使用这些模式来学习。
两者都是基于统计数据。
Although overlap between data Data mining and Machine Learning, we can distinguish between them; simply, such as:
Data mining search for patterns to predict and/or describe huge data,
Machine Learning goes further to use these patterns to learn.
And both based on Statistics.
@SpeedBirdNine 已经给出了全面的答案。旁注:
关于你的最后一个问题,我认为,在任何有意义的研究中,你要么需要对大数据应用一些统计方法,这就是DM/ML派上用场的时候,要么你需要应用已经设计好的DM/ML方法基于经典统计。这是每个 DM/ML 研究都会涉及的两个部分,也不排除统计,更不用说当目标是提出一种高贵的 DM/ML 算法来分析/聚类/分类大数据时。
A comprehensive answer was already given by @SpeedBirdNine. As a side note:
Regarding your last question, in my opinion, in any meaningful research, you either need to apply some statistical methods on big data and this is when DM/ML comes in handy, or you need to apply a DM/ML method which is already designed based on classical statistics. These are the two sections that every DM/ML research is involved, and statistics is not excluded, let alone when the goal is to come up with a noble DM/ML algorithm to analyze/cluster/classify big data.