Why not run the emails through spamassassin or some such filter that will attach a bayes score. You can then just read that score. It will save you reinventing the wheel.
You could bayes score the email against a database of all previous emails from the individual.
There is also looking up the Sender Permitted Framework and DomainKeys, which SpamAssassin can do for you.
Probably not practical but something that would work:
When an incoming mail arrives, have a "reply to sender" function and simply ask if they sent it. This could be in the form of a confirmation link that is automatically generated or something.
But since I don't know the specifics of the project this may not be practical... like if you had to do this multiple times for each user, no one would put up with it.
Not knowing the context under which you want to analyse this, and being very general I would suggest your first port of call is SPF or DomainKeys in order to limit the possibility of email coming from a rogue source being accepted. I would also recommend using only one SMTP server with SSL security. I do this and travelling worldwide I have rarely been in a situation I couldn't send mail and in those cases the only thing that did work was webmail (no safe local SMTP).
Additionally to that: if you are verifying mail is really coming from yourself then you could also use PGP tools to sign your mail upon sending and then filter any mail that didn't have a valid signature. Enigmail in Thunderbird is a good source of automatic signing and there are plugins for Outlook as well.
After that if you really want to do a more forensic job on an email then you could use a Spam Bayes to score the email against a database of previous emails. You would build up a database of tokens around the non-unique data (excluding entries such as "To:") and then score the email for the probability that it is like the previous emails. In theory you should score very highly for any mail.
Obviously I don't know your situation, but I think that there are many techniques but sometimes it is easier to go to the root of the issue than try and fix it down the line.
Update
Based on the context supplied:
I would consider using "Address Extensions" this is where your user can send mail to an address which contains a reference using the email address: [email protected] GMail and many other servers support delivery of email with a +extension@ through to the correct [email protected] without hi-jinx. You could get the user to deliver mail with a unique ID as the extension and that way you would know it had come from them and they would feel more special. Obviously someone could steal their unique code by sniffing their outgoing or your incoming mail but that is always possible and if someone can do that they can probably inject mail as well.
If you really just want to go down the analysis route then I would suggest just using the reverse of a SpamAssassin per-user Bayes match. Where you compare every mail to a database of mails from a sender (instead of the traditional matching of mails 'to' an account). Remembering that once your database is polluted with a false positive you will have to remove the false positive or risk the integrity of the matching for that sender.
发件人策略框架 (SPF),如 RFC 4408 中定义的电子邮件 验证系统旨在防止 通过解决常见问题来发送垃圾邮件 漏洞、源地址 欺骗。 SPF 允许电子邮件 管理员指定的能力 允许哪些互联网主机 发送声称源自的电子邮件 通过创建特定的 DNS 来该域 公共 DNS 记录中的 SPF 记录。 然后邮件交换器使用 DNS 记录以验证发件人身份 对照公布的名单 电子邮件管理员。
Maybe look into using Sender Policy Framework. It might not be exactly what you are looking for but it might help.
Briefly, the design intent of the SPF record is to allow a receiving MTA (Message Transfer Agent) to interrogate the Name Server of the domain which appears in the email (the sender) and determine if the originating IP of the mail (the source) is authorized to send mail for the sender's domain.
Ripped from wikipedia:
Sender Policy Framework (SPF), as defined in RFC 4408, is an e-mail validation system designed to prevent e-mail spam by addressing a common vulnerability, source address spoofing. SPF allows e-mail administrators the ability to specify which Internet hosts are allowed to send e-mail claiming to originate from that domain by creating a specific DNS SPF record in the public DNS record. Mail exchangers then use the DNS record to verify the sender's identity against the list published by the e-mail administrator.
发布评论
评论(5)
您是对的,将所有标头放在一起,并与“已知良好”的电子邮件进行比较,可以帮助识别可能的欺骗性电子邮件。
您正在开发的内容最多可能是一种启发式方法,而不是一种算法。
我会考虑按一天中的时间以及与“已知良好”电子邮件的时间的接近程度对字段进行加权......
此外,如果“已知良好”电子邮件的结构与嫌疑人不同;即内联图像、html、缩短的 url 等。
You're right that all of the headers together, and 'known good' email to compare to can help identify likely spoofed emails.
What you're developing would probably be at best a heuristic rather than an algorithm.
I'd consider weighting the fields by time-of-day and how close to 'known good' emails' time-of-day ...
Also, if the 'known good' emails are structured differently than the suspect; i.e. Inline images, html, shortened url's, etc.
为什么不通过 spamassassin 或类似的会附加贝叶斯分数的过滤器运行电子邮件。然后您就可以读取该分数。它将帮助您避免重新发明轮子。
您可以根据该个人之前所有电子邮件的数据库对电子邮件进行贝叶斯评分。
SpamAssassin 还可以查找发件人允许的框架和域密钥。
Why not run the emails through spamassassin or some such filter that will attach a bayes score. You can then just read that score. It will save you reinventing the wheel.
You could bayes score the email against a database of all previous emails from the individual.
There is also looking up the Sender Permitted Framework and DomainKeys, which SpamAssassin can do for you.
可能不实用,但可行:
当收到邮件到达时,有一个“回复发件人”功能,并简单地询问他们是否发送了邮件。这可以是自动生成的确认链接或其他形式。
但由于我不知道该项目的具体情况,这可能不切实际......就像如果您必须为每个用户多次执行此操作,没有人会忍受它。
Probably not practical but something that would work:
When an incoming mail arrives, have a "reply to sender" function and simply ask if they sent it. This could be in the form of a confirmation link that is automatically generated or something.
But since I don't know the specifics of the project this may not be practical... like if you had to do this multiple times for each user, no one would put up with it.
只是为了称赞我的兄弟之前发帖:
不知道您要分析此问题的上下文,而且非常笼统,我建议您的第一个调用端口是 SPF 或 DomainKeys,以限制来自恶意来源的电子邮件的可能性公认。我还建议仅使用一台具有 SSL 安全性的 SMTP 服务器。我这样做并在世界各地旅行时,我很少遇到无法发送邮件的情况,在这种情况下,唯一有效的是网络邮件(没有安全本地 SMTP)。
除此之外:如果您要验证邮件确实来自您自己,那么您还可以使用 PGP 工具在发送时对邮件进行签名,然后过滤掉任何没有有效签名的邮件。 Thunderbird 中的 Enigmail 是自动签名的一个很好的来源,并且还有 Outlook 的插件。
之后,如果您确实想要对电子邮件进行更多取证工作,那么您可以使用垃圾邮件贝叶斯根据以前电子邮件的数据库对电子邮件进行评分。您将围绕非唯一数据(不包括“收件人:”等条目)建立一个标记数据库,然后对电子邮件与之前的电子邮件相似的概率进行评分。理论上,您对任何邮件都应该得分很高。
显然我不知道你的情况,但我认为有很多技术,但有时找到问题的根源比尝试解决它更容易。
更新
根据提供的上下文:
我会考虑使用“地址扩展”,您的用户可以在此处将邮件发送到包含使用电子邮件地址的引用的地址:[电子邮件受保护]
GMail 和许多其他服务器支持将带有 +extension@ 的电子邮件发送到正确的 [电子邮件] protected] 没有 hi-jinx。您可以让用户发送带有唯一 ID 作为扩展名的邮件,这样您就知道邮件来自他们,他们会感觉更特别。显然,有人可以通过嗅探他们发出或收到的邮件来窃取他们的独特代码,但这总是可能的,如果有人可以做到这一点,他们也可能可以注入邮件。
如果您真的只想沿着分析路线走下去,那么我建议只使用 SpamAssassin 每用户贝叶斯匹配的相反内容。您可以将每封邮件与发件人的邮件数据库进行比较(而不是传统的将邮件“匹配”到帐户)。请记住,一旦您的数据库受到误报污染,您将必须删除误报,否则将面临该发件人匹配完整性的风险。
Just to compliment my brothers posting earlier:
Not knowing the context under which you want to analyse this, and being very general I would suggest your first port of call is SPF or DomainKeys in order to limit the possibility of email coming from a rogue source being accepted. I would also recommend using only one SMTP server with SSL security. I do this and travelling worldwide I have rarely been in a situation I couldn't send mail and in those cases the only thing that did work was webmail (no safe local SMTP).
Additionally to that: if you are verifying mail is really coming from yourself then you could also use PGP tools to sign your mail upon sending and then filter any mail that didn't have a valid signature. Enigmail in Thunderbird is a good source of automatic signing and there are plugins for Outlook as well.
After that if you really want to do a more forensic job on an email then you could use a Spam Bayes to score the email against a database of previous emails. You would build up a database of tokens around the non-unique data (excluding entries such as "To:") and then score the email for the probability that it is like the previous emails. In theory you should score very highly for any mail.
Obviously I don't know your situation, but I think that there are many techniques but sometimes it is easier to go to the root of the issue than try and fix it down the line.
Update
Based on the context supplied:
I would consider using "Address Extensions" this is where your user can send mail to an address which contains a reference using the email address: [email protected]
GMail and many other servers support delivery of email with a +extension@ through to the correct [email protected] without hi-jinx. You could get the user to deliver mail with a unique ID as the extension and that way you would know it had come from them and they would feel more special. Obviously someone could steal their unique code by sniffing their outgoing or your incoming mail but that is always possible and if someone can do that they can probably inject mail as well.
If you really just want to go down the analysis route then I would suggest just using the reverse of a SpamAssassin per-user Bayes match. Where you compare every mail to a database of mails from a sender (instead of the traditional matching of mails 'to' an account). Remembering that once your database is polluted with a false positive you will have to remove the false positive or risk the integrity of the matching for that sender.
也许可以考虑使用发件人策略框架。它可能不完全是您正在寻找的内容,但可能会有所帮助。
简而言之,SPF记录的设计意图是允许接收MTA(邮件传输代理)询问电子邮件中出现的域(发件人)的名称服务器,并确定邮件的原始IP(来源)被授权为发件人的域发送邮件。
摘自维基百科:
Maybe look into using Sender Policy Framework. It might not be exactly what you are looking for but it might help.
Briefly, the design intent of the SPF record is to allow a receiving MTA (Message Transfer Agent) to interrogate the Name Server of the domain which appears in the email (the sender) and determine if the originating IP of the mail (the source) is authorized to send mail for the sender's domain.
Ripped from wikipedia: