机器学习-分类算法
我想找到以下概率:
P(y=1/n=k; thetha)
读作:
概率,预测为 1 类,给定单词数 = k,由 thetha 参数化
传统分类没有条件概率(右)
P(y = 1; thetha)
我该如何解决这个问题?
编辑:
例如,假设我想根据附件的数量来预测电子邮件是否是垃圾邮件。 令 y=1
表示垃圾邮件,y=0
表示非垃圾邮件。
那么,
P(y = 1/num_attachements=0; some attributes)
and so on!!
这有意义吗?
I want to find the following probability:
P(y=1/n=k; thetha)
Read as:
Probability, The prediction is class 1 given number of words = k, parametrized by thetha
A traditional classification doesn't have the conditional probability (right)
P(y = 1; thetha)
How do I solve this?
EDIT:
For example, lets say I want to predict whether an email is spam or not based on the number of attachments.
Let y=1
indicate spam and y=0
be non-spam.
So,
P(y = 1/num_attachements=0; some attributes)
and so on!!
Is it making any sense?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
通常附件数量只是另一个属性,所以你的概率是相同的
是布尔值),你可以单独计算它们,然后组合为:
但是,如果你对附件有一些特殊处理(例如,其他属性是数字而附件 code>C 代表事件
y = 1
,A
- 代表附件,B
代表其他属性。有关几种 Nave Bayes 分类器的描述,请参阅本文。
Normally number of attachments is just another attribute, so your probability is the same as
However, if you have some special treatment of attachment (say, other attributes are numeric and attachment is boolean) you can compute them separately and then combine as:
where
C
stands for eventy = 1
,A
- for attachments andB
for other attributes.See this paper for description of several Nave Bayes classifiers.
使用朴素贝斯分类器。您可以自己快速编写一个代码,或者使用/查看 nltk 库。
Use a Naive Baisean classifier. You can code one yourself quite quickly or use/look at the nltk library.