mod_rewrite 和冗余/旧网址,需要一些 SEO 最佳实践

发布于 2024-09-06 06:15:09 字数 1051 浏览 3 评论 0原文

看看谷歌目前如何看待我们的网站并发现不足... 基本上,我们使用 URL 重写的沼泽标准结构,使它们看起来 SEO 友好。

例如,产品 URL 的形式为 any string_([0-9]).html 等。当然,这允许我们在产品 ID 之前链接到我们想要的任何内容...我们已经做到了。过去,产品页面是 Product_Name_79.html,然后变成 Brand_Name_Product_Name_79.html。 apache 并不真正关心,并且 id 79 在任何一种情况下都会被传递。然而,谷歌现在在不同的 URL 下缓存了该产品的 2 个版本 - 这不是一件好事,因为它继续到达第一个 URL 并抓取它。

同样的事情也适用于我们对品牌和类别的重写规则,其中一些已被删除,其中一些已被修改。

site:domain 中有超过 11k 个网址,而我们的站点地图只有 5.8k 个。如何防止蜘蛛获取您不再链接的旧版本网址(考虑到这不是手动过程,而且通常此类网址可能非常动态)。

例如,Mens_Merrell_Trail_Running_Shoes__50-100__10____024/merrell 品牌的动态 URL,按 越野跑鞋 中价格介于 之间的商品缩小范围50 和 100 以及尺码 10,性别设置为男士

如果我们决定nofollow任何大小和金钱过滤器网址,谷歌仍然能够通过其旧缓存访问它们...

禁止特定类型网址的最佳实践是什么?由于上面的组合几乎是无限的,我无法列出一个列表,而且它当然不能根据谷歌历史上为我们保留的品牌和类别进行回溯。

当应用这样的过滤器时,我们是否应该添加 noindex ?我们应该将它们导出到 robots.txt 吗?什么都不做,希望谷歌停止返回?

客观地说,我们有 2600 个产品页面网址现在已被冗余/禁用,您会如何处理它们?重定向到主页、品牌页面、404,什么都不做?

感谢您的任何建议

Having a look at how google perceives our site at the moment and coming up short...
Basically, we use a bog-standard structure of URL rewriting to make them look SEO friendly.

for instance, a product URL takes shape of any string_([0-9]).html and so forth. of course, this allows us to link to whatever we want before the product id... which we have done. In the past, a product page was Product_Name_79.html and then became Brand_Name_Product_Name_79.html. apache does not really care and id 79 gets passed on in either case. However, google now has 2 versions of this product cached under different URLs - and that's not a good thing as it continues to arrive to the first URL and spider it.

same thing applies to our rewrite rules for brands and categories, some of which had been dropped and some of which have been modified.

there are over 11k urls in site:domain whereas our sitemap gets some 5.8k only. how would you prevent spiders from fetching older versions of urls that you no-longer link to (considering it's not a manual process and often such urls can be very dynamic).

eg, Mens_Merrell_Trail_Running_Shoes__50-100__10____024/ is a dynamic url for the merrell brand, narrowed down by items in trail running shoes that cost between 50 and 100 and size 10 with gender set to men's.

if we decide to nofollow any size and money filter urls, that leaves google still being able to access them through its old cache...

what is the best practice for disallowing a particular type of urls? as the combinations above are nearly infinite, i cannot produce a list and it certainly cannot be backdated against what brands and categories google may hold for us historically.

shall we add noindex when such filters are applied? shall we export them to robots.txt? do nothing in the hope that google stops returning?

to put it into perspective, we have 2600 product page urls that are now redundant / disabled, what would you do with them? redirect to homepage, brand page, 404, do nothing?

thanks for any advice

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

唯憾梦倾城 2024-09-13 06:15:09

我认为您正在寻找 rel="canonical",如果您的链接确实没有链接到,谷歌应该开始忽略它们。您可以使用如下工具检查任何传入链接:http://www.seomoz.org/linkscape

另外,如果您的旧网址匹配(或不匹配)一致的模式,您可以在 apache 中为匹配旧模式或不匹配新模式的页面设置 301 重定向...

希望这会有所帮助!

i think you're looking for rel="canonical", google should start ignoring you're links if they're really not linked to. You can check any incoming links with a tool like this: http://www.seomoz.org/linkscape.

Also if you're old urls match (or don't match) a consisent pattern you could set up a 301 redirect in apache either for pages matching the old pattern or not matching the new pattern...

hope this helps!

椒妓 2024-09-13 06:15:09

请务必为您更改的任何 URL 设置重定向。另外,我不建议使用 rel=nofollow,因为它向 Google 表明您的网站不值得信任。

Just be sure to set up redirects for any URL you change. Also, I don't recommend using rel=nofollow since it indicates to Google that your site is not trustworthy.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文