数据挖掘中的模型是什么?
我想知道,MODEL在数据挖掘中到底有什么直接作用?谁能解释一下吗?
当我使用 Weka 时,我会获取数据、选择方法并通过单击“开始”按钮生成模型。谁能解释一下这个模型背后的内容以及模型在生成后如何工作。例如,它使用我选择的方法来分类示例?
请问有人可以解释一下这些事情吗?
I want to know, what direct is MODEL in data mining? Can anyone explain that?
When I use Weka, I take my data, choose method and generate MODEL by clicking Start button. Can anyone explain what is behind this model and how model works after I generated it. It uses my chosen method for example to classify example?
Please can someone explain these things?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
该模型只是描述了尝试处理新数据时使用的信息。在简单的垃圾邮件检测场景中,算法通过查看带注释的电子邮件来确定哪些单词似乎指向垃圾邮件,哪些单词不是。然后单词列表形成您的模型。
当收到新电子邮件时,您不会将它们与其他真实电子邮件进行比较,而是会考虑新电子邮件的单词并检查您的模型(单词列表)是否它们似乎表明垃圾邮件。您会看到,您变得独立于训练数据,相反,您拥有尝试建模整个“垃圾邮件与非垃圾邮件”现实的知识。
The model simply describes the information that is used when trying to deal with new data. In a simple spam detection scenario the algorithm determines which words seem to point to spam and which don't by looking at annotated emails. The lists of words then form your model.
When receiving new email you won't compare them with other real emails, instead you will consider the new email's words and check your model (word lists) whether they seem to indicate a spam mail or not. You see, that you become independent from your training data, instead you have a piece of knowdledge that tries to model the whole "spam vs. non-spam"-reality.
假设只有以下与音乐相关的变量:吉他独奏(有/没有)、突然的音调变化(有/没有)、人声(有/没有、男/女)、鼓(有/有) 't,常规/电子)。
现在,假设您喜欢有吉他独奏、突然的音调变化、女声和电子鼓的音乐。另一方面,我欣赏有吉他独奏、突然音调变化、没有人声、有常规鼓声的音乐。
这些偏好可以被认为是我们享受音乐的模型。
现在,假设有一首歌曲有吉他独奏、突然的音调变化、女声和电子鼓。如果我们要判断您是否喜欢这首歌,答案是肯定的,即 100% 匹配。但我呢?嗯,我欣赏这首歌的 5 个特点中的 3 个,所以我可能会喜欢它。
我们上面给出的关于欣赏或不欣赏这首歌的答案可以被视为机器学习中的分类任务。现在,如果我们必须根据上述音乐偏好和音乐功能对每个人进行分组,我们就会聚类 音乐听众等等。
我们如何为某事建立模型?当然,从数据来看。当您使用 Weka 时,您的 .arff 文件包含您的训练数据,Weka 使用这些数据来了解这些数据所描述的内容(在我们的示例中,它将了解我们的音乐偏好)。
学习过程会生成一个模型,用于对新数据进行分类、分组等。例如,如果我们向 Weka 提供我们的音乐偏好,并指示它使用贝叶斯分类器学习我们的模型,那么当我们向它提供特征时对于一首给定的歌曲,它能够判断我们是否喜欢这首歌,以及喜欢的概率是多少。
Suppose there are only the following variables related to music: guitar solos (has/hasn't), sudden tone changes (has/hasn't), vocal (has/hasn't, male/female), drums (has/hasn't, regular/electronic).
Now, let's suppose that you enjoy music when it has guitar solos, has sudden tone changes, has female vocals, and electronic drums. On the other had, I appreciate music when it has guitar solos, has sudden tone changes, has no vocals, and has regular drums).
Those preferences can be thought of as our models for enjoying music.
Now, suppose there's a song which has guitar solos, sudden tone changes, female vocals, and electronic drums. If we were to tell if you enjoy or not this song, the answer would be yes, that's a 100% match. But what about me? Well, I appreciate 3 of the 5 features of the song, so I'd likely enjoy it.
The answer we gave above about appreciating or not the song can be regarded as a classification task in machine learning. Now, if we had to group everyone regarding musical preferences and the music features above, we'd be clustering the music listeners, and so on.
How do we build a model for something? Of course, from data. When you're working with Weka, your .arff files contain your training data, which Weka uses to learn about the thing depicted by those data (in our example, it would learn our musical preferences).
The learning process generates a model, which is used to classify new data, group them, etc. For instance, if we provided Weka with our music preferences and instructed it to learn our models with a Bayesian classifier, when we provide it with the features of a given song, it will be able to tell if we'd like or not that song, and within what probability.