短信垃圾邮件防护算法

发布于 2024-12-13 14:13:02 字数 73 浏览 5 评论 0原文

我正在开发一个 Android 消息应用程序。有没有一种适合短信的垃圾邮件过滤算法?请提供一些东西来启动。

拉希姆。

I am developing an android messaging application. Is there a good spam filtering algorithm that works well for SMS? Please give some things to kick start.

Rahim.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

对风讲故事 2024-12-20 14:13:02

我不认为有一套算法可以让你明确知道用户是否认为短信是垃圾短信(短信中的广告对某些用户可能很重要,而对其他人来说可能是垃圾短信)但是你可以做什么谷歌确实可以识别垃圾邮件。

您可以允许用户将短信标记为垃圾邮件或非垃圾邮件,然后根据用户标记为垃圾邮件的内容,您可以决定用户是否将其视为垃圾邮件。

编辑:仍然最接近我在 此 pdf 有关基于内容的 SMS 垃圾邮件过滤。

这不是一个算法,而是你应该记住的事情。

引用pdf:

当今用于减少垃圾邮件的最流行的技术包括
以下内容。

白名单和黑名单。出现在黑名单(例如 RBL)中的发件人被视为垃圾邮件发送者,他们的邮件将被阻止。这
来自白名单中发件人的邮件(例如地址簿或
提供商本身 – Hotmail)被认为是合法的,因此
已交付。

协同过滤。当用户将邮件标记为垃圾邮件时,对于与他/她类似的用户来说,这将被视为垃圾邮件。或者,
服务提供商认为海量消息是垃圾邮件。

数字签名。没有数字签名的邮件被视为垃圾邮件。数字签名可以由发送者提供或
服务提供商。

基于内容的过滤。最常用的方法。每条消息都会被搜索垃圾邮件特征,例如指示性词语(例如“免费”、
“伟哥”等),标点符号和大写的异常分布
字母(例如“BUY!!!!!!”)等。

其中有很多有用的信息。一探究竟。

I don't think there is a set algorithm through which you can definitely know whether or not user considers an SMS to be spam, (an ad in SMS can be important to some users and spam to others) what you can do however is what Google does to identify spam mail.

You could allow the user to mark an SMS as spam or not spam and then based on what content has been marked as spam by the user, you can decide whether the user considers it spam or not.

Edit: still closest to what you are looking for I found in this pdf on Content Based SMS Spam Filtering.

It's not an algorithm but rather things you should keep in mind.

Quoting from the pdf:

The most popular techniques used to reduce spam nowadays include the
following ones.

White and black listing. The senders occurring in a black list (e.g. RBL) are considered spammers, and their messages blocked. The
messages from senders in a white list (e.g. the address book, or the
provider itself – Hotmail) are considered legitimate, and thus
delivered.

Collaborative filtering. When a user tags a message as spam, this is considered spam for users similar to him/her. Alternatively, the
service provider considers that massive messages are spam.

Digital signatures. Messages without a digital signature are considered spam. Digital signatures can be provided by the sender or
the service provider.

Content-based filtering . The most used method. Each messaged is searched for spam features, like indicative words (e.g. “free”,
“viagra”, etc.), unusual distribution of punctuation marks and capital
letters (like e.g. in “BUY!!!!!!”), etc.

There is a lot of good info in there. Check it out.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文