如何编写垃圾邮件过滤器

发布于 2024-07-08 16:24:04 字数 258 浏览 11 评论 0原文

我不得不编写一个简单的垃圾邮件过滤器 我不太确定我要怎么做。

到目前为止,我已经提出了单词列表和域过滤,这将给出或删除达到特定阈值的点。

例如,如果您从黑名单域中撰写有关“v1agr4”的内容,则您将因垃圾邮件而获得 2 分,但如果您从 hotmail.com 帐户中撰写有关“v1agr4”的内容,则只会获得 1 分“垃圾邮件点”。

你们还有其他建议/资源吗?

这更多的是学习垃圾邮件过滤器,而不是开发企业级的东西

I'm stuck in having to write a simple spam filter
I'm not really sure about how I'm going to do it.

So far I've come up with wordlist and domain filtering, which will give or remove points up to a certain threshold.

For example, if you're writing about "v1agr4" from a blacklisted domain, you'll get like 2 points for spam, but if you're writing about "v1agr4" from a hotmail.com account, you'll get only 1 "spam point".

Do you guys have any other suggestions / ressources?

This is more about learning spam filters than developing something enterprise grade

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

习惯成性 2024-07-15 16:24:04

这里有一些非常好的算法信息:

http://www.paulgraham.com/spam.html

http://www.paulgraham.com/better.html

但是,说真的,为什么要重新发明车轮?

只需下载 K9:http://keir.net/k9.html

Some really good algorithm info here:

http://www.paulgraham.com/spam.html

http://www.paulgraham.com/better.html

But, seriously, why reinvent the wheel?

Just download K9: http://keir.net/k9.html

隔岸观火 2024-07-15 16:24:04

一些与贝叶斯垃圾邮件过滤相关的开源 Java 项目(LFSR Consulting 提到过):

还有一个针对 C++ 的额外功能:

Some open source Java projects related to Bayesian Spam Filtering (that was mentioned by LFSR Consulting):

And one extra for C++:

和我恋爱吧 2024-07-15 16:24:04

查看贝叶斯垃圾邮件过滤

我知道 perl 有一个库,所以我假设 java 也有一个库。

Look into Bayesian Spam Filtering.

I know perl has a library for it, so I'd assume java would have one too.

萌酱 2024-07-15 16:24:04

您可以将其委托给分布式服务。 Akismet 是一个非常好的解决方案。

You can delegate that to a distributed service. Akismet is a very good solution.

顾铮苏瑾 2024-07-15 16:24:04

编写垃圾邮件过滤器取决于您对可扩展性的需求。

如果您想要一个可扩展的解决方案,那么内容过滤可能不是明智的选择,因为它非常消耗 CPU 和内存,您宁愿选择基于信誉的过滤或基于黑名单的过滤,这对 CPU 更友好在您的服务器上并且更容易编写。

我在我的博客上写了一篇一篇文章< /a> 从程序员的角度解释了编写垃圾邮件过滤器背后的想法,并涵盖了从基于内容的过滤到基于黑名单的过滤的所有选项。

Writing a spam filter depends upon your demands for scalability.

If you want a scalable solution, then content-filtering is probably not the smart choice to make as it is very CPU and memory consuming, and you would instead rather choose either reputation based filtering or blacklist based filtering, which will be way more CPU friendly on your server as well as much easier to write.

I wrote a a post on my blog that explains the idea behind writing a spam filter from a programmer's point of view and covers all the options from content based filtering to black list based filtering.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文