（客观）C 中的垃圾邮件检测

发布于 2024-08-13 09:50:14 字数 993 浏览 10 评论 0原文

我目前正在编写一个 iPhone 应用程序，它从用户那里获取一些数据并将其上传到服务器。上传的数据将显示给同一程序的其他用户（还有更多内容，但为了保持简单的想法......）。上传的数据基本上只是三个字符串：名称（最多 50 个字符）、标题（最多 50 个字符）和一些文本（几乎无限字符）。我需要的基本上是一个可以检测数据输入的有效性的函数、服务或算法。它必须能够检测一系列重复字符、某些“非法”单词、异常空格等。所以我的问题是；是否有用于此类数据验证的 C 或 Objective-C 库（内置或开源），否则我将如何进行此类检查？

以下是好数据和坏数据的两个示例：

好：

姓名：“约翰·亚伦·史密斯”  
标题：“为什么我还在这里？”  
短信：“有人可以帮助我吗？我感到孤独！”

坏的：

名称：“去你的，kldsanfklds”   
标题：“仅 99 美元。立即购买。仅 99 美元”  
文本：“ndsaklgnvds lakævndsaklæfhadsæhdsjka fhdskjafhdskj lafhsdkhf。€#&/ #&()(/&%& ># €%€#% €#& hidosæahviædshvidshfiodsa。adsifjDSILFJIDSH \n\n\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n"

我知道在很多情况下采取预防措施是很困难的，但这个算法/库只需要过滤掉最糟糕的垃圾邮件。我还将在最终数据库提交之前查看数据，但当然垃圾邮件越少，我就越容易得到它。

你的，本。

编辑：我最“流利”的语言是 Objective-C，但我在 C 方面也做得很好，并且我了解 PHP 和 JAVA。其他语言的库/示例对我来说可能很难理解，也很难“翻译”成有效的 iPhone 语言。

编辑-编辑：我并不是在寻找过于复杂的东西。这只是我进行粗剪的简单方法。

原文

I'm currently writing an iPhone application which gets some data from the user and uploads it to a server. The uploaded data will be displayed to other users of the same program (there's more to it than that, but to keep the idea simple...). The data which is uploaded is basically just three strings: a name(max. 50 char.), a title(max. 50 char.) and some text(virtually unlimited char.). What I need is basically a function, service or algorithm which can detect how valid the data input is. It would have to be able to detect series of repetitive characters, certain 'illegal' words, abnormal whitespaces, etc. So my questions is; is there a C or Objective-C library (build-in or open source) for this sort of data validation, or else, how would I go about doing this kind of check?

Here are two examples of good and bad data:

GOOD:

Name: "John Aaron Smith"  
Title: "Why am I still here?"  
Text: "Can anybody please help me? I'm feeling lonely!"

BAD:

Name: "f**k you, kldsanfklds"   
Title: "Only $99. Buy Now. Only $99"  
Text: "ndsaklgnvds lakævndsaklæfhadsæhdsjka fhdskjafhdskj lafhsdkhf. €#&/ #&()(/&%& ># €%€#% €#& hidosæahviædshvidshfiodsa. adsifjDSILFJIDSH \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"

I know taking precautions for so many cases will be difficult, but this algorithm/library would just have to filter the worst spam. I will also be looking through the data before the final database submission, but of course the less spam, the easier I'll have it.

Yours,
BEN.

EDIT: My most 'fluent' language is objective-C, but I'm also doing pretty well with C, and I have knowledge of PHP and JAVA. Libraries/examples in other languages might be difficult for me to understand, and 'translate' into a valid iPhone language.

EDIT-EDIT: I'm not looking for something overly sophisticated. Just a simple way for me to do the rough cut.

分享到QQ

分享到微博