统计、机器学习和数据挖掘

发布于 2024-12-05 21:46:27 字数 217 浏览 6 评论 0原文

我目前正在学习数据挖掘,有以下问题。

  1. 机器学习和数据挖掘之间有什么关系?
  2. 我发现许多数据挖掘技术都与统计相关,而我“听说”数据挖掘与机器学习有很多关系。所以我的问题是:机器学习与统计学密切相关吗?
  3. 如果它们不密切相关,是否存在将侧重于统计技术的数据挖掘和侧重于机器学习技能的数据挖掘分开的划分?因为我发现有些研究生院统计系开设了数据挖掘课程。

I am currently learning data mining and I have the following questions.

  1. what is the relationship between machine learning and data mining?
  2. I found many data mining techniques are associated with statistics, while I "hear" data mining has many thing to do with machine learning. So my question is: is machine learning closely related with statistics?
  3. If they are not closely related, is there such divisions that separate data mining focusing on statistical techniques and data mining focusing on machine learning skills? Because I found department of statistics of some graduate schools open data mining courses.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

长途伴 2024-12-12 21:46:28

数据挖掘是从数据中提取有用信息的过程,例如模式、趋势、客户/用户行为、喜欢/不喜欢等。这涉及使用与人工智能和统计相关的算法。

Wikipedia 对数据挖掘的定义是:

数据挖掘(数据库中知识发现的分析步骤
过程,[1]或KDD),一个相对年轻的跨学科领域
计算机科学,[2][3]是发现新模式的过程
来自涉及统计和人工方法的大数据集
智能还包括数据库管理。相比之下,例如
机器学习,重点在于发现以前的
未知模式而不是将已知模式推广到新模式
数据。

机器学习涉及让计算机“学习”行为、趋势等,并据此采取行动。例如,在信用卡欺诈中,计算机“学习”客户的行为,如果发生奇怪的情况(涉及非常高金额的交易等),它会将该交易标记为潜在的欺诈。

维基百科对机器学习的定义是:

机器学习是人工智能的一个分支,是一门科学
与算法设计和开发相关的学科
允许计算机根据经验数据进化行为,例如
例如来自传感器数据或数据库。机器学习关注的是
允许机器学习的算法的开发
基于代表不完整的观察数据的归纳推理
有关统计现象的信息。分类也是
称为模式识别,是机器学习中的一项重要任务
学习,机器通过它“学习”自动识别复杂的
模式,根据不同的范例来区分
模式,并做出明智的决策。

机器学习使用数据挖掘来学习模式、行为、趋势等,因为数据挖掘是从一组数据中提取这些信息的方法。数据挖掘和机器学习都使用统计数据来做出决策。所以,是的,统计数据在数据挖掘和机器学习中涉及并且非常重要。

Data mining is the process of extracting useful information from data, such as patterns, trends, customer/user behavior, liking/disliking etc. This involves the use of algorithms that are related to Artificial Intelligence and statistics.

Wikipedia's definition of Data Mining is:

Data Mining (the analysis step of the Knowledge Discovery in Databases
process,[1] or KDD), a relatively young and interdisciplinary field of
computer science,[2][3] is the process of discovering new patterns
from large data sets involving methods from statistics and artificial
intelligence but also database management. In contrast to for example
machine learning, the emphasis lies on the discovery of previously
unknown patterns as opposed to generalizing known patterns to new
data.

Machine Learning involves making the computers "learn" that behavior, trend etc, and to act according. For example, in credit card fraud, the computer "learns" the behavior of a customer, and if something strange occurs (a transaction involving very high amounts etc), it flags that transaction for potential fraud.

Wikipedia's definition of machine learning is:

Machine learning, a branch of artificial intelligence, is a scientific
discipline concerned with the design and development of algorithms
that allow computers to evolve behaviors based on empirical data, such
as from sensor data or databases. Machine Learning is concerned with
the development of algorithms allowing the machine to learn via
inductive inference based on observing data that represents incomplete
information about statistical phenomenon. Classification which is also
referred to as pattern recognition, is an important task in Machine
Learning, by which machines “learn” to automatically recognize complex
patterns, to distinguish between exemplars based on their different
patterns, and to make intelligent decisions.

Machine learning uses Data Mining to learn the pattern, behavior, trend etc, because Data Mining is the way of extracting this information from a set of data. Data Mining and Machine Learning both use Statistics make decisions. So yes statistics is involved and is very important in Data Mining and Machine learning.

不再见 2024-12-12 21:46:28

不同的人所说的机器学习、数据挖掘和统计之间往往有很多重叠。这些术语的定义取决于您询问的对象。

这是很好的概述,其中有很多很棒的内容链接。

There tends to be a lot of overlap between what different people call machine learning, data mining and statistics. The very definitions of the terms would depend on whom you ask.

Here is a nice overview, with lots of great links.

dawn曙光 2024-12-12 21:46:28

虽然数据挖掘和机器学习之间有重叠,但我们可以区分它们;简单来说,比如:
数据挖掘搜索模式来预测和/或描述海量数据,
机器学习进一步使用这些模式来学习。
两者都是基于统计数据。

Although overlap between data Data mining and Machine Learning, we can distinguish between them; simply, such as:
Data mining search for patterns to predict and/or describe huge data,
Machine Learning goes further to use these patterns to learn.
And both based on Statistics.

各自安好 2024-12-12 21:46:28

@SpeedBirdNine 已经给出了全面的答案。旁注:

  • 数据挖掘和机器学习主要基于统计学家古老但巧妙的想法。 (推论统计、决策理论等)
  • 经典统计+当今强大的计算机=DM& ML
  • 由于我们生活在大数据时代,统计学家过去面临的缺乏足够数据的障碍已不再是问题。因此,在很多情况下(当然不是全部),可以肯定地说数据挖掘/机器学习是新的统计学! (他们在方程中使用的无穷大符号 ∞ 表示,如果 n(样本大小)趋于无穷大,那么一切行为都是可预测的(!),不再是妥协的现实!)。

关于你的最后一个问题,我认为,在任何有意义的研究中,你要么需要对大数据应用一些统计方法,这就是DM/ML派上用场的时候,要么你需要应用已经设计好的DM/ML方法基于经典统计。这是每个 DM/ML 研究都会涉及的两个部分,也不排除统计,更不用说当目标是提出一种高贵的 DM/ML 算法来分析/聚类/分类大数据时。

A comprehensive answer was already given by @SpeedBirdNine. As a side note:

  • Data-mining and Machine-learning are mainly based on the old but ingenious ideas of statisticians. (Inferential statistics, decision theories, etc.)
  • Classic Statistics + today's powerful computers = DM & ML
  • Since we are living in the era of big data, the barrier statisticians used to be faced with, in terms of the absence of enough data, is no longer an issue. Therefore, in many cases (but not all of course), it is safe to say that Data-mining/Machine-learning is the new Statistics! (The infinity symbol ∞ they used to have in their equations that if n (the sample size) goes to infinity, then everything's behavior is predictable (!), is not a compromised reality anymore!).

Regarding your last question, in my opinion, in any meaningful research, you either need to apply some statistical methods on big data and this is when DM/ML comes in handy, or you need to apply a DM/ML method which is already designed based on classical statistics. These are the two sections that every DM/ML research is involved, and statistics is not excluded, let alone when the goal is to come up with a noble DM/ML algorithm to analyze/cluster/classify big data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文