php 文件被 Alexa 爬虫执行并引发问题!

发布于 2024-08-23 05:29:53 字数 386 浏览 10 评论 0原文

我编写了一个脚本,用于在特定时间自动发布新页面。它只会显示一个倒计时器,然后当它达到 0 时,它会将特定文件重命名为index.php,并将当前的index.php 重命名为index-modified.php

这没有问题。但在某个时候,我的客户告诉我的网站无法访问。我发现index.php 已重命名为index-modified.php,并且所有其他页面都工作正常。如果没有index.php,我的网站会显示 404 错误。

然后我分析了访问日志,发现alexa爬虫访问了该发布脚本,这导致了问题

我想知道alexa爬虫是如何找到我的内部脚本文件并爬行的?我的所有内部管理用途文件都会发生这种情况吗?我的任何页面上都没有该脚本的任何链接。

我想知道它如何找到我的服务器中存在的文件..???

I've wrote a script that will be used to release the new pages automatically at a particular time. It will just show a countdown timer and then when it reaches 0 it will rename a particular file into index.php and renames the current index.php to index-modified.php

There's no problem in this. But at some point time my customer told that my site is not coming.. I found that the index.php is renamed into index-modified.php and all other pages are working fine. And without index.php my site was showing 404 error.

Then i analyzed the access log and found the alexa crawler have accessed that release script and that caused the problem

I want to know how the alexa crawler had found my internal script file and crawled that?? Will it happen to all my internal admin purpose files? I dont have any links for that script at any of my pages.

I wonder how it could find the files that are present inside my server..????

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

负佳期 2024-08-30 05:29:53

我想知道它如何找到我的服务器中存在的文件?

可能是因为访问这些文件的人使用了 Alexa 工具栏,

它只能做到这一点,因为脚本有两个问题。

  1. 它不受身份验证/授权层的保护。

  2. 它对服务器进行了重大更改以响应 GET 请求。 HTTP 规范为“安全”请求提供了 GET,为执行某些操作的请求提供了 POST。

I wonder how it could find the files that are present inside my server?

Probably because someone who accessed those files used the Alexa Toolbar

It only managed to do this because there are two things wrong with the script.

  1. It is not protected with an authentication/authorization layer.

  2. It makes a significant change on the server in response to a GET request. The HTTP spec provides GET for "safe" requests and POST for requests which do something.

烟酉 2024-08-30 05:29:53

index.php 是目录中默认的 PHP 脚本名称。当您导航到目录而不给出文件名时,它将被执行。

要解决此问题,请使用 POST 来调用修改。如果你做不到这一点,那么至少给脚本起一个不太可能被猜到的名称。

index.php is the default PHP script name in a directory. It will be executed when you navigate to the directory without giving a filename.

To solve this use POST to invoke the modifications. If you can't do that, then at least give the script a name that is unlikely to be guessed.

迷路的信 2024-08-30 05:29:53

您应该使用 robots.txt 并禁止蜘蛛抓取:

User-agent: *
Disallow: index.php

You should use robots.txt and disallow spiders from crawling:

User-agent: *
Disallow: index.php
怀念你的温柔 2024-08-30 05:29:53

如果您的脚本位于 htdocs(对于 apache)文件夹中,爬虫程序很可能会找到它并尝试爬行它。你可以做的是:

1)在robots.txt中添加一条规则,在这里你可以了解更多信息:
http://www.javascriptkit.com/howto/robots.shtml

这将< em>建议爬虫不要执行脚本,但不会禁止它们

2)将脚本放在子文件夹中并使用密码保护它 - 最好根据您的情况,实际上您不想要的是随机的访问者或蜘蛛禁用您的网站。有关如何轻松做到这一点的更多信息,请参见 .htaccess:

http://www.javascriptkit.com/ howto/htaccess3.shtml

祝你好运,
马林

if you script is located within the htdocs (for apache) folder chances are the crawlers will find it and try to crawl it. What you can do is:

1) put a rule in robots.txt, here you can learn more about it :
http://www.javascriptkit.com/howto/robots.shtml

This will advise crawlers not to execute the script, but won't forbid them to

2) put the script in a subfolder and protect it with a password - best in your case, REALLY what you don't want is random visitors or spiders to disable your web site. More about how to do that easy is .htaccess here:

http://www.javascriptkit.com/howto/htaccess3.shtml

Wish you best of luck,
Marin

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文