目前最好的垃圾邮件过滤算法

发布于 2024-10-08 12:44:52 字数 44 浏览 0 评论 0原文

目前检测垃圾邮件的最佳方法是什么?尤其是手机短信。 有资源或者比较分析吗?

What is the currently best method to detect spam ? especially on mobile text message.
are there any resource or comparison analysis ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

箜明 2024-10-15 12:44:52

研究监督学习技术是件好事。多项研究表明,多项式朴素贝叶斯分类器已被用于垃圾邮件过滤,并取得了巨大成功。如果它适用于垃圾邮件过滤,那么它也应该适用于短信过滤。您需要的是一个巨大的垃圾短信示例数据集,并用它来训练分类器。

此外,研究支持向量机可能会有所帮助,它;尽管在垃圾邮件过滤中应用较少;是一种更强大的技术。

此外,仅在原始文本上训练算法可能并不是最好的方法。 Mehran Sahami 于 1998 年进行的一项研究发现,当考虑其他启发式方法时,他们获得了卓越的性能(例如,电子邮件是否发送到邮件列表?电子邮件是否从以“.edu”结尾的域名发送) 、“.com”、“.org”?电子邮件是否包含多个标点符号(“!!!”)?等等)。

但从多项式朴素贝叶斯分类器开始。它实现起来非常简单,而且非常容易使用,而且从个人经验来看:它的培训时间也很短。

It's good to look into supervised learning techniques. There've been a number of studies where the Multinomial Naive Bayes Classifier has been used for spam email filtering with a lot of success. If it worked for spam email filtering, then it should work with SMS filtering. What you need is a huge dataset of example spam SMS texts and train the classifier with it.

Also, it may be helpful to look into the Support Vector Machine, which; although less widely used in spam filtering; is a much more powerful technique.

Also, just training the algorithms on raw text may not quite be the best way forward. There was a study by Mehran Sahami from 1998 that found that they achieved superior performance when they took other heuristics into consideration (e.g. was the email sent to a mailing list? was the email sent from a domain name that ended in either ".edu",".com",".org"? did the email contain multiple punctuation marks ("!!!")?, and so forth).

But start off with the Multinomial Naive Bayes Classifier. It's very simple to implement, and it's very easy to use, and from personal experience: it has a very short training time, as well.

寄风 2024-10-15 12:44:52

据我了解,大多数现代垃圾邮件过滤都是贝叶斯定理的实现和一些启发式方法的组合,例如发件人黑名单、标准合规性、发送模式。

在移动电话网络中实现此功能的最简单位置可能是 SMS 消息中心,因为体积更大,这使得许多启发式方法更容易实现。

As I understand it most modern spam filtering is a combination of an implementation of Bayes' theorem and some heuristics, e.g. sender blacklists, standards compliance, sending patterns.

The easiest place to implement this in the mobile phone network would probably be at the SMS message centre, since the volume is higher, which makes a lot of the heuristics easier to implement.

弄潮 2024-10-15 12:44:52

使用各种算法和启发法(而不是“最佳”方法)是保护您的网络和订阅者免受垃圾邮件、欺诈、恶意内容、网络欺凌、身份盗窃、病毒等侵害的好方法。

Cloudmark 及其各种合作伙伴和竞争是一个开始寻找的好地方。

Using a wide variety of algorithms and heuristics (and not "the" best method) is a good approach to protect your network and subscribers from spam, fraud, malicious content, cyber-bullying, identity theft, viruses, etc.

Cloudmark and it's various partners and competition is a good place to start looking.

独﹏钓一江月 2024-10-15 12:44:52

为什么您需要事后检测垃圾邮件,防止它在屁股里...再次,在萌芽状态...

更新:
黑帽 SEO/SEm 和犯罪分子很容易广泛地使用过滤器来黑名单/转储竞争对手。
此外,它们具有追溯力,因此注定总是落后于垃圾邮件发送者的技术进步

Why do you need to detect spam post-factum, prevent it in the butt ... again, int the bud ...

Update:
Filters are easily and broadly being used by blackhat SEO/SEm and criminals to blacklist/dump competitors.
Besides, they are retroactive, hence, doomed to always lag behind spammers techniques advancements

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文