这个 robots.txt 是否只允许 googlebot 为我的网站建立索引?

发布于 2024-09-25 06:29:18 字数 365 浏览 2 评论 0原文

此 robots.txt 文件是否只允许 googlebot 索引我网站的 index.php 文件?注意,我有一个 htaccess 重定向,输入

http://www.example.com/index.php

的人会被重定向到

http://www.example.com/< /code>

所以,这是我的 robots.txt 文件内容...

User-agent: Googlebot
Allow: /index.php
Disallow: /

User-agent: *
Disallow: /

提前致谢!

Will this robots.txt file only allow googlebot to index my site's index.php file? CAVEAT, I have an htaccess redirect that people who type in

http://www.example.com/index.php

are redirected to simply

http://www.example.com/

So, this is my robots.txt file content...

User-agent: Googlebot
Allow: /index.php
Disallow: /

User-agent: *
Disallow: /

Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

ま柒月 2024-10-02 06:29:18

并不真地。

好机器人
只有“好的”机器人才会遵循 robots.txt 指令(并非所有机器人和蜘蛛都会费心阅读/遵循 robots.txt)。这甚至可能不包括所有主要搜索引擎的机器人,但这绝对意味着某些网络爬虫将完全忽略您的请求(如果您确实想阻止机器人/爬虫看到某些内容,您应该考虑使用 .htaccess 或密码保护)您的网站)。

第二次检查
Google 多次访问您的网站,包括显示为浏览用户。第二次访问将忽略 robots.txt 文件。第二次访问可能实际上并不索引(如果这是您担心的),但它确实会检查以确保您没有试图欺骗索引机器人(用于搜索引擎优化等)。

话虽这么说,你的语法是正确的......如果这就是你所要求的,那么是的,它会起作用,只是没有你希望的那么好。

Not really.

Good bots
Only "good" bots follow the robots.txt instructions (not all robots and spiders bother to read/follow robots.txt). That might not even include all the main search engine's bots, but it definitely mean that some web crawlers will just completely ignore your requests (you should look at using .htaccess or password protection if you really want to stop bots/crawlers from seeing parts of your site).

Second checks
Google makes multiple visits to your website, including appearing as a browsing user. This second visit will ignore the robots.txt file. The second visit probably doesn't actually index (if that's your worry) but it does check to make sure you're not trying to fool the indexing bot (for SEO etc).

That being said your syntax is right... if that's all you're asking, then yes it'll work, just not as well as you might hope.

罪#恶を代价 2024-10-02 06:29:18

如果没有重定向,Googlebot 将看不到您的网站(index.php 除外)。

对于重定向,这取决于机器人如何处理重定向以及您的 htaccess 如何进行重定向。如果您返回 302,则 Googlebot 将看到 http://www.example.com/,检查 robots.txt,但看不到主网站。即使您执行内部重定向并告诉 Googlebot 响应页面是 http://www.example.com/,它也会看到该页面,但可能不会对其建立索引。

Absent the redirect, Googlebot would not see your site, except for the index.php.

With the redirect, it depends on how the bot handles redirects and how your htaccess does the redirect. If you return a 302, then Googlebot will see http://www.example.com/, check against robots.txt, and not see the main site. Even if you do an internal redirect and tell Googlebot that the responding page is http://www.example.com/, it will see the page but might not index it.

め七分饶幸 2024-10-02 06:29:18

这是有风险的。为了确保 Google 确实为您的主页建立索引,请执行以下操作:

User-agent: *
Allow: /index.php
Disallow: /a
Disallow: /b
...
Disallow: /z
Disallow: /0
...
Disallow: /9

因此您的根“/”将不匹配禁止规则。

另外,如果您有 AdSense,请不要忘记添加

User-agent: Mediapartners-Google
Allow: /

It's risky. To be sure that Google does index your homepage make this:

User-agent: *
Allow: /index.php
Disallow: /a
Disallow: /b
...
Disallow: /z
Disallow: /0
...
Disallow: /9

So your root "/" will not match disallow rules.

Also if you have AdSense don't forget to add

User-agent: Mediapartners-Google
Allow: /
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文