Gmail 垃圾邮件过滤器如何工作?

发布于 2024-09-10 16:30:10 字数 250 浏览 6 评论 0原文

我总是对 Gmail 垃圾邮件过滤器的高质量感到惊讶。去年,它过滤了 99.95% 的垃圾邮件,仅错误拦截了一封邮件。相比之下,我使用的任何其他邮件服务每 50 封邮件中至少会出现 1 个错误。

Gmail 在内部是如何达到这种质量水平的?是否基于客户反馈(即,如果N 个客户将邮件作为垃圾邮件阻止,则对于所有其他客户来说,该邮件都会被分类为垃圾邮件)?还是有什么窍门?也许基本的过滤算法可以过滤掉最明显的垃圾邮件,并且一些困难的情况由真人进行分析?

I'm always surprised by the high quality of Gmail spam filter. For the last year, it filtered 99.95% of the spam, and blocked by mistake only one mail. By comparison, any other mail service I used makes at least one mistake for every 50 mails.

How, internally, Gmail does to reach this level of quality? Is it based on customers feedback (ie. if N customers block mail as spam, it is sorted as spam for every other customer)? Or there is some trick? Maybe a basic filter algorithm filters the most obvious spam, and some difficult cases are analyzed by real humans?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

逆夏时光 2024-09-17 16:30:10

简而言之,这是基于社区反馈。以下引用官方解释:

Gmail 用户在阻止垃圾邮件进入数百万个收件箱方面发挥着重要作用。当 Gmail 社区通过点击投票将特定电子邮件举报为垃圾邮件时,我们的系统很快就会学会开始阻止类似邮件。社区标记的垃圾邮件越多,我们的系统就越智能。

您可以在他们的 上了解更多相关信息垃圾邮件解释页面。

Briefly speaking this is based on the community feedback. Here is a citation from official explanation:

Gmail users play an important role in keeping spammy messages out of millions of inboxes. When the Gmail community votes with their clicks to report a particular email as spam, our system quickly learns to start blocking similar messages. The more spam the community marks, the smarter our system becomes.

You can read a bit more about it on their Spam Explained page.

烟酉 2024-09-17 16:30:10

这是一个价值百万美元的问题,如果它能够在 stackOverflow 上得到回答,那么每个人的垃圾邮件过滤器都会同样有效。

This is the million dollar question, and if it were able to be answered on stackOverflow, then everyones spam filter would be as effective.

软甜啾 2024-09-17 16:30:10

我真的不知道谷歌到底是如何过滤垃圾邮件的(但我认为这毕竟是一个商业秘密)。如果您对垃圾邮件过滤的工作原理感兴趣,我建议您查看贝叶斯垃圾邮件过滤 (http://en .wikipedia.org/wiki/Bayesian_spam_filtering)。这是一个相当容易理解的方法。

I don't really know how exactly Google does SPAM filtering (but I think it's a business secret after all). If you are interested in how SPAM filtering works, I would recommend looking at Bayesian SPAM filtering (http://en.wikipedia.org/wiki/Bayesian_spam_filtering). It's a rather easy to understand method.

匿名的好友 2024-09-17 16:30:10

谷歌很可能使用分类器系统,例如逻辑回归或神经网络。最先进的垃圾邮件检测经常采用诸如此类的机器学习算法。

输出分类是“垃圾邮件”或“非垃圾邮件”,我确信输入是 Google 的最高机密,但我确信某些电子邮件文本短语,例如“立即购买”、“特价”、“ “伟哥”或“男性增强”都是他们模型中的因素。

Google is most likely using a classifier system, such as Logistic Regression or Neural Networks. State of the art spam detection frequently employs Machine Learning algorithms such as these.

The output classification is "Spam" or "Not Spam," and the inputs, I'm sure, are top secret at Google, but I'm sure certain email text phrases such as "Buy Now," "On Sale," "Viagra," or "Male Enhancement" are all factors in their model.

岁月染过的梦 2024-09-17 16:30:10

目前还没有官方发布这方面的信息,大多数建议只是观察/专家的观点。

根据我对我们发送的电子邮件的观察,我的发现如下:

1.用户参与度是关键:如果用户没有参与您的电子邮件,那么您的电子邮件必然会被标记为垃圾邮件。
以下是一些指标:
- 您向谁发送电子邮件以及您向他们发送电子邮件的频率
- 您打开了哪些电子邮件
- 您回复了哪些电子邮件
- 您经常阅读的电子邮件中的关键字
- 您对哪些电子邮件加注星标、存档或删除

2.发件人域信誉:发送域的过去历史是什么?如果过去用户参与度较高,那么来自同一域的新电子邮件登陆收件箱的概率就很高。

谷歌正在使用复杂的人工智能和机器学习算法来实现这一目标。虽然您可能通过更改 IP、域或返回路径获得一些成功,但所有这些都将是非常短期的黑客行为。

There is no Official release on this, and most of the suggestions are just observations/experts view.

Based on my observations on emails we deliver, here are my findings:

1. User engagement is the key: If users are not engaging in your emails then your emails are bound to be flagged as spam.
Here are some metrics:
- Whom you email, and how often you email them
- Which emails you open
- Which emails you reply to
- Keywords that are in emails you usually read
- Which emails you star, archive, or delete

2. Sender Domain Reputation: What is the past history of the sending domain? If in past the user engagement was higher then probability of the new email from the same domain landing in Inbox is high.

Google is using complex AI and Machine learning algorithms to make this happen. While you might get some success by changing the IP, domain or return-path, but all that will be a very short term hacks.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文