Robots.txt - 如何设置规则不使用参数索引页面
我们最近在网页上添加了一个新部分。本质上,它是一个购物产品目录,允许过滤不同的属性 - 帮助访问者将结果缩减到他们需要的内容。
这些参数被传递到 URL 中,我知道 Google 会将这些页面索引为不同的页面,尽管它们本质上包含相同的内容。我知道我可以通过在 Robots.txt 文件中设置适当的规则来指定 Google(和其他搜索引擎)可以索引哪些页面。
这是有问题的页面上的: http://www.reyniersaudio .com/recording-computer-studio-gear/audio-interfaces 正如您将看到的,如果您选择页面右侧的任何过滤器或选择“排序依据”选项,它会将您发送到一个新页面,该页面具有相同的 url + 以“&”开头的字符串。 filters=" 和 "&order_by="
我应该在 robots.txt 中添加什么规则来告诉搜索引擎不要为这些冗余页面建立索引?
We recently added a new section to our webpage. Essentially it's a shopping product catalog that allows for filtering of different attributes - helping the visitor whittle down the results to what it is they need.
The parameters are passed into a URL and I know Google will index these pages as different pages although they essentially hold the same content. I know I can specify which pages Google (and other search engines) can index by setting up the appropriate rules in the Robots.txt file.
This is on the of the pages in question: http://www.reyniersaudio.com/recording-computer-studio-gear/audio-interfaces
As you'll see, if you select any of the filters on the right side of the page or select a "Sort By" option it will send you to a new page which has the same url + a string that starts with "&filters=" and "&order_by="
What rule should I add to my robots.txt that will tell the search engines not to index those redundant pages?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Google 网站站长工具可以帮助您直接告诉他们什么/如何解释你的各种参数。无需使用robots.txt
Google Webmaster Tools has a way for you to tell them directly what/how to interpret your various parameters. No need to use robots.txt
如果您想阻止 Googlebot 抓取包含“&”的网址,您可以这样写:
或者,如果您不希望任何机器人执行此操作,只需将其中的
Googlebot
替换为*。
请注意,并非所有机器人都处理通配符。 Googlebot 和 MSN bot(无论现在叫什么)都可以。我认为 Blekko 的机器人可以。我知道我的也是这样。有些可能不会,因为通配符不是原始 robots.txt 规范的一部分(它从来都不是“真正的”标准)。
If you want to prevent Googlebot from crawling URLs that contain "&", you can write:
Or, if you don't want any bot to do it, just replace
Googlebot
in that with*
.Note that not all bots handle wildcards. Googlebot and MSN bot (whatever it's called these days) do. I think Blekko's bot does. I know that mine does, as well. Some might not, as wildcards aren't part of the original robots.txt specification (which never was a "real" standard).