使用 MySQL 检测垃圾邮件发送者

发布于 2025-01-05 16:10:24 字数 371 浏览 0 评论 0原文

我发现越来越多的用户在我的网站上注册，只是为了向其他用户发送重复的垃圾邮件消息。我添加了一些服务器端代码来使用以下 mysql 查询检测重复消息：

  SELECT count(content) as msgs_sent 
    FROM messages 
   WHERE sender_id = '.$sender_id.' 
GROUP BY content having count(content) > 10

该查询运行良好，但现在他们通过更改消息中的一些字符来解决此问题。有没有办法用 MySQL 检测到这一点，或者我是否需要查看从 MySQL 返回的每个分组，然后使用 PHP 来确定相似性百分比？

有什么想法或建议吗？

原文

I see an ever increasing number of users signing up on my site to just send duplicate SPAM messages to other users. I've added some server side code to detect duplicate messages with the following mysql query:

  SELECT count(content) as msgs_sent 
    FROM messages 
   WHERE sender_id = '.$sender_id.' 
GROUP BY content having count(content) > 10

The query works well but now they're getting around this by changing a few charctersr in their messages. Is there a way to detect this with MySQL or do I need to look at each grouping returned from MySQL and then use PHP to determine the percentage of similarity?

Any thoughts or suggestions?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

像极了他 2025-01-12 16:10:24

全文匹配

您可以考虑实现类似于MATCH示例的内容此处：

mysql> SELECT id, body, MATCH (title,body) AGAINST
    -> ('Security implications of running MySQL as root') AS score
    -> FROM articles WHERE MATCH (title,body) AGAINST
    -> ('Security implications of running MySQL as root');
+----+-------------------------------------+-----------------+
| id | body                                | score           |
+----+-------------------------------------+-----------------+
|  4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
|  6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)

对于您的示例，也许：

SELECT id, MATCH (content) AGAINST ('your string') AS score
FROM messages 
WHERE MATCH (content) AGAINST ('your string')
    AND score > 1;

请注意，要使用这些函数，您的content列需要是FULLTEXT 指数。

本例中的分数是什么？

它是一个相关性值。它是通过以下描述的过程计算的：

集合和查询中的每个正确单词都会被加权
根据其在收集或查询中的重要性。
因此，出现在许多文档中的单词具有较低的
重量（甚至可能为零重量），因为它具有较低的
这个特定集合中的语义价值。相反，如果这个词
很稀有，它的权重更高。单词的权重是
组合起来计算行的相关性。

来自文档页面。

Fulltext Match

You could look at implementing something similar to the MATCH example here:

mysql> SELECT id, body, MATCH (title,body) AGAINST
    -> ('Security implications of running MySQL as root') AS score
    -> FROM articles WHERE MATCH (title,body) AGAINST
    -> ('Security implications of running MySQL as root');
+----+-------------------------------------+-----------------+
| id | body                                | score           |
+----+-------------------------------------+-----------------+
|  4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
|  6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)

So for your example, perhaps:

SELECT id, MATCH (content) AGAINST ('your string') AS score
FROM messages 
WHERE MATCH (content) AGAINST ('your string')
    AND score > 1;

Note that to use these functions your content column would need to be a FULLTEXT index.

What is score in this example?

It is a relevance value. It is computed through the process described below:

Every correct word in the collection and in the query is weighted
according to its significance in the collection or query.
Consequently, a word that is present in many documents has a lower
weight (and may even have a zero weight), because it has lower
semantic value in this particular collection. Conversely, if the word
is rare, it receives a higher weight. The weights of the words are
combined to compute the relevance of the row.

From the documentation page.

回复收藏 0 原文

~没有更多了~