BOT/蜘蛛陷阱创意

发布于 2024-09-25 03:47:25 字数 339 浏览 8 评论 0 原文

我有一个客户,他的域名似乎受到 DDoS 攻击的严重打击。在日志中,看起来很正常的具有随机 IP 的用户代理,但它们翻阅页面的速度太快,不像人类。他们似乎也没有要求任何图像。我似乎找不到任何模式,我怀疑这是一群 Windows 僵尸。

客户过去曾遇到过垃圾邮件攻击的问题,甚至不得不将 MX 指向 Postini,以获取每天 6.7 GB 的垃圾数据来阻止服务器端。

我想在 robots.txt 不允许的目录中设置 BOT 陷阱...只是以前从未尝试过类似的事情,希望有人有一个捕获 BOT 的创意!

编辑:我已经有很多关于捕捉一个的想法......这是当它落入陷阱时该怎么做。

I have a client whose domain seems to be getting hit pretty hard by what appears to be a DDoS. In the logs it's normal looking user agents with random IPs but they're flipping through pages too fast to be human. They also don't appear to be requesting any images. I can't seem to find any pattern and my suspicion is it's a fleet of Windows Zombies.

The clients had issues in the past with SPAM attacks--even had to point MX at Postini to get the 6.7 GB/day of junk to stop server-side.

I want to setup a BOT trap in a directory disallowed by robots.txt... just never attempted anything like this before, hoping someone out there has a creative ideas for trapping BOTs!

EDIT: I already have plenty of ideas for catching one.. it's what to do to it when lands in the trap.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

燕归巢 2024-10-02 03:47:30

好吧,我必须说,有点失望——我希望能有一些创造性的想法。我确实在这里找到了理想的解决方案.. http://www.kloth.net/internet/bottrap。 php

<html>
    <head><title> </title></head>
    <body>
    <p>There is nothing here to see. So what are you doing here ?</p>
    <p><a href="http://your.domain.tld/">Go home.</a></p>
    <?php
      /* whitelist: end processing end exit */
      if (preg_match("/10\.22\.33\.44/",$_SERVER['REMOTE_ADDR'])) { exit; }
      if (preg_match("Super Tool",$_SERVER['HTTP_USER_AGENT'])) { exit; }
      /* end of whitelist */
      $badbot = 0;
      /* scan the blacklist.dat file for addresses of SPAM robots
         to prevent filling it up with duplicates */
      $filename = "../blacklist.dat";
      $fp = fopen($filename, "r") or die ("Error opening file ... <br>\n");
      while ($line = fgets($fp,255)) {
        $u = explode(" ",$line);
        $u0 = $u[0];
        if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;}
      }
      fclose($fp);
      if ($badbot == 0) { /* we just see a new bad bot not yet listed ! */
      /* send a mail to hostmaster */
        $tmestamp = time();
        $datum = date("Y-m-d (D) H:i:s",$tmestamp);
        $from = "[email protected]";
        $to = "[email protected]";
        $subject = "domain-tld alert: bad robot";
        $msg = "A bad robot hit $_SERVER['REQUEST_URI'] $datum \n";
        $msg .= "address is $_SERVER['REMOTE_ADDR'], agent is $_SERVER['HTTP_USER_AGENT']\n";
        mail($to, $subject, $msg, "From: $from");
      /* append bad bot address data to blacklist log file: */
        $fp = fopen($filename,'a+');
        fwrite($fp,"$_SERVER['REMOTE_ADDR'] - - [$datum] \"$_SERVER['REQUEST_METHOD'] $_SERVER['REQUEST_URI'] $_SERVER['SERVER_PROTOCOL']\" $_SERVER['HTTP_REFERER'] $_SERVER['HTTP_USER_AGENT']\n");
        fclose($fp);
      }
    ?>
    </body>
</html>

然后为了保护页面抛出 在每个页面的第一行.. blacklist.php 包含:

<?php
    $badbot = 0;
    /* look for the IP address in the blacklist file */
    $filename = "../blacklist.dat";
    $fp = fopen($filename, "r") or die ("Error opening file ... <br>\n");
    while ($line = fgets($fp,255))  {
      $u = explode(" ",$line);
      $u0 = $u[0];
      if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;}
    }
    fclose($fp);
    if ($badbot > 0) { /* this is a bad bot, reject it */
      sleep(12);
      print ("<html><head>\n");
      print ("<title>Site unavailable, sorry</title>\n");
      print ("</head><body>\n");
      print ("<center><h1>Welcome ...</h1></center>\n");
      print ("<p><center>Unfortunately, due to abuse, this site is temporarily not available ...</center></p>\n");
      print ("<p><center>If you feel this in error, send a mail to the hostmaster at this site,<br>
             if you are an anti-social ill-behaving SPAM-bot, then just go away.</center></p>\n");
      print ("</body></html>\n");
      exit;
    }
?>

我计划采纳 Scott Chamberlain 的建议,并且为了安全起见,我计划在脚本上实现验证码。如果用户回答正确,那么它就会死亡或重定向回网站根目录。只是为了好玩,我将陷阱放入名为 /admin/ 的目录中,当然还添加了 Disallow: /admin/ 到 robots.txt。

编辑:此外,我将忽略规则的机器人重定向到此页面:http: //www.seastory.us/bot_this.htm

Well I must say, kinda disappointed--I was hoping for some creative ideas. I did find the ideal solutions here.. http://www.kloth.net/internet/bottrap.php

<html>
    <head><title> </title></head>
    <body>
    <p>There is nothing here to see. So what are you doing here ?</p>
    <p><a href="http://your.domain.tld/">Go home.</a></p>
    <?php
      /* whitelist: end processing end exit */
      if (preg_match("/10\.22\.33\.44/",$_SERVER['REMOTE_ADDR'])) { exit; }
      if (preg_match("Super Tool",$_SERVER['HTTP_USER_AGENT'])) { exit; }
      /* end of whitelist */
      $badbot = 0;
      /* scan the blacklist.dat file for addresses of SPAM robots
         to prevent filling it up with duplicates */
      $filename = "../blacklist.dat";
      $fp = fopen($filename, "r") or die ("Error opening file ... <br>\n");
      while ($line = fgets($fp,255)) {
        $u = explode(" ",$line);
        $u0 = $u[0];
        if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;}
      }
      fclose($fp);
      if ($badbot == 0) { /* we just see a new bad bot not yet listed ! */
      /* send a mail to hostmaster */
        $tmestamp = time();
        $datum = date("Y-m-d (D) H:i:s",$tmestamp);
        $from = "[email protected]";
        $to = "[email protected]";
        $subject = "domain-tld alert: bad robot";
        $msg = "A bad robot hit $_SERVER['REQUEST_URI'] $datum \n";
        $msg .= "address is $_SERVER['REMOTE_ADDR'], agent is $_SERVER['HTTP_USER_AGENT']\n";
        mail($to, $subject, $msg, "From: $from");
      /* append bad bot address data to blacklist log file: */
        $fp = fopen($filename,'a+');
        fwrite($fp,"$_SERVER['REMOTE_ADDR'] - - [$datum] \"$_SERVER['REQUEST_METHOD'] $_SERVER['REQUEST_URI'] $_SERVER['SERVER_PROTOCOL']\" $_SERVER['HTTP_REFERER'] $_SERVER['HTTP_USER_AGENT']\n");
        fclose($fp);
      }
    ?>
    </body>
</html>

Then to protect pages throw <?php include($DOCUMENT_ROOT . "/blacklist.php"); ?> on the first line of every page.. blacklist.php contains:

<?php
    $badbot = 0;
    /* look for the IP address in the blacklist file */
    $filename = "../blacklist.dat";
    $fp = fopen($filename, "r") or die ("Error opening file ... <br>\n");
    while ($line = fgets($fp,255))  {
      $u = explode(" ",$line);
      $u0 = $u[0];
      if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;}
    }
    fclose($fp);
    if ($badbot > 0) { /* this is a bad bot, reject it */
      sleep(12);
      print ("<html><head>\n");
      print ("<title>Site unavailable, sorry</title>\n");
      print ("</head><body>\n");
      print ("<center><h1>Welcome ...</h1></center>\n");
      print ("<p><center>Unfortunately, due to abuse, this site is temporarily not available ...</center></p>\n");
      print ("<p><center>If you feel this in error, send a mail to the hostmaster at this site,<br>
             if you are an anti-social ill-behaving SPAM-bot, then just go away.</center></p>\n");
      print ("</body></html>\n");
      exit;
    }
?>

I plan to take Scott Chamberlain's advice and to be safe I plan to implement Captcha on the script. If user answers correctly then it'll just die or redirect back to site root. Just for fun I'm throwing the trap in a directory named /admin/ and of coursed adding Disallow: /admin/ to robots.txt.

EDIT: In addition I am redirecting the bot ignoring the rules to this page: http://www.seastory.us/bot_this.htm

梦亿 2024-10-02 03:47:30

你可以先看看ip是从哪里来的。我的猜测是他们都来自一个国家,比如中国或尼日利亚,在这种情况下你可以在htaccess中设置一些东西来禁止来自这两个国家的所有IP,至于为机器人创建一个陷阱,我一点也不知道

You could first take a look at where the ip's are coming from. My guess is that they are all coming from one country like china or Nigeria, in which case you could set up something in htaccess to disallow all ip's from those two countries, as for creating a trap for bots, i havent the slightest idea

ま柒月 2024-10-02 03:47:29

你可以做的是获得另一个盒子(一种牺牲品),而不是与你的主主机在同一管道上,然后让该主机重定向到自身的页面(但在 URL 中使用随机页面名称)。这可能会让机器人陷入无限循环,将 CPU 和带宽占用在你的牺牲品上,而不是在你的主盒子上。

What you can do is get another box (a kind of sacrificial lamb) not on the same pipe as your main host then have that host a page which redirects to itself (but with a randomized page name in the url). this could get the bot stuck in a infinite loop tieing up the cpu and bandwith on your sacrificial lamb but not on your main box.

百思不得你姐 2024-10-02 03:47:29

我倾向于认为这是一个通过网络安全比编码更好地解决的问题,但我看到了您的方法/问题中的逻辑。

关于服务器故障有许多可能值得调查的问题和讨论。

https://serverfault.com/search?q=block+bots

I tend to think this is a problem better solved with network security more so than coding, but I see the logic in your approach/question.

There are a number of questions and discussions about this on server fault which may be worthy of investigating.

https://serverfault.com/search?q=block+bots

沦落红尘 2024-10-02 03:47:28

您可以设置一个 PHP 脚本,其 URL 被 robots.txt 明确禁止。在该脚本中,您可以提取攻击您的可疑机器人的源 IP(通过 $_SERVER['REMOTE_ADDR']),然后将该 IP 添加到数据库黑名单表中。

然后,在您的主应用程序中,您可以检查源 IP,在黑名单表中查找该 IP,如果找到它,则抛出 403 页面。 (也许会出现这样的消息:“我们检测到来自您的 IP 的滥用行为,如果您认为这是错误的,请通过以下方式联系我们......”)

从好的方面来说,您会自动将不良机器人列入黑名单。不利的一面是,它的效率不是很高,而且可能很危险。 (一个人出于好奇而天真地检查该页面可能会导致大量用户被禁止。)

编辑:或者(或者另外,我想)您可以相当简单地添加 GeoIP 检查您的应用程序,并根据来源国家/地区拒绝点击。

You can set up a PHP script whose URL is explicitly forbidden by robots.txt. In that script, you can pull the source IP of the suspected bot hitting you (via $_SERVER['REMOTE_ADDR']), and then add that IP to a database blacklist table.

Then, in your main app, you can check the source IP, do a lookup for that IP in your blacklist table, and if you find it, throw a 403 page instead. (Perhaps with a message like, "We've detected abuse coming from your IP, if you feel this is in error, contact us at ...")

On the upside, you get automatic blacklisting of bad bots. On the downside, it's not terribly efficient, and it can be dangerous. (One person innocently checking that page out of curiosity can result in the ban of a large swath of users.)

Edit: Alternatively (or additionally, I suppose) you can fairly simply add a GeoIP check to your app, and reject hits based on country of origin.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文