如何使用 .htaccess 避免双重 google 索引?
我有一个网站,其根目录中有一个很好的 RewriteRule,它将所有此类查询重定向:
http://domain.com/foo/parameter
用户
http://domain.com/index.php?args=parameter
只能看到干净的 URL,每个人都很高兴。
现在问题来了:domain.com DNS 有一个domain.com 的A 记录,指向一个私有服务器IP,还有一个mail.domain.com 的A 记录,指向完全相同的IP。
由于某种未知的原因,在过去的几个月中,Google 对我网站的所有页面(http://domain.com/foo/par1
、http://domain.com/par1)进行了双重索引。 com/foo/par2
等)与另一组邮件子域(http://mail.domain.com/foo/par1
、http://mail. domain.com/foo/par2
等)。
我以为我可以摆脱所有这些,将任何请求重定向到 mail.domain.com/$whatever 到 domain.com,最终 Google 会明白所有带有“mail”子域的页面都会重定向到主页,因此没有必要。
我在 .htaccess 中尝试过:
RewriteCond %{HTTP_HOST} ^mail.domain.com$ [NC]
RewriteRule ^(.*)$ http://domain.com [R=301,L]
但这会重定向到一个可见的 URL,如下所示:http://domain.com/index.php?args=parameter
,而我只想重定向到主页。
正确的形式是什么,是否有更优雅的方法来实现这一点,也许可以在 robots.txt 中添加一些内容? (请注意,我不能在这里禁止子文件夹)
I have a website, with a nice RewriteRule in its root, that redirects all the queries of this kind:
http://domain.com/foo/parameter
into
http://domain.com/index.php?args=parameter
Users can only see the clean URL and everyone is happy.
Now here is the problem: domain.com DNS have an A record for domain.com, pointing to a private server IP, and an A record for mail.domain.com, pointing to the exact same IP.
For some unknown reason, in the last couple of months, Google double indexed all the pages of my site (http://domain.com/foo/par1
, http://domain.com/foo/par2
etc.) with another set with the mail subdomain (http://mail.domain.com/foo/par1
, http://mail.domain.com/foo/par2
etc).
I thought I could get rid of all of them redirecting any request to mail.domain.com/$whatever to domain.com and eventually Google would understand that all those pages with the 'mail' subdomain redirects to the homepage and are therefore not necessary.
I tried this in .htaccess:
RewriteCond %{HTTP_HOST} ^mail.domain.com$ [NC]
RewriteRule ^(.*)$ http://domain.com [R=301,L]
But this redirects to a visible URL that looks like this: http://domain.com/index.php?args=parameter
, while I just want a redirect to the homepage.
What's the correct form, and are there more elegant ways to achieve this, maybe adding something into robots.txt? (Please note that I can't just disallow a subfolder here)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您只想通过丢弃原始 REQUEST_URI 和 QUERY_STRING 来重定向到主页,请使用以下规则:
通过将
?
放在末尾,它将删除原始查询字符串,因此会出现以下类型的 URL:< code>http://mail.domain.com/index.php?args=parameter 将变为http://domain.com/
If you just want to redirect to home page by discarding the original REQUEST_URI and QUERY_STRING then use these rules:
By putting
?
in the end it will strip out original query string, thus a URL of this type:http://mail.domain.com/index.php?args=parameter
will becomehttp://domain.com/
您的规则是正确的,但您需要将其放在所有其他规则之前(紧接在
RewriteEngine On
之后),否则它将获取内部重写 URL 的最新状态。更新:嗯,你说你的旧规则重定向正确,但使用的是内部的、丑陋的 URL。实际上不应该是这样,除非您添加
$1
来挑选匹配的字符串。Your rule is correct, but you need to put it before all the other rules (right after
RewriteEngine On
) or it will pick up the latest state of the internal rewritten URL.Update: Hmm, you said that your old rule redirects correctly but is using the internal, ugly, URL. That actually shouldn't be the case unless you add
$1
to pick out the matched string.