当前位置：文江博客话题详情

避免 Google 上的存档页面出现重复内容？

发布于 2024-11-17 15:52:51 字数 484 浏览 4 评论 0 原文

我网站上的每篇博客文章 - http://www.corlated.org - 都存档在其自己的永久链接中网址。

在每个存档页面上，我不仅要显示存档的帖子，还要显示之前发布的 10 篇帖子，以便人们可以更好地了解博客提供的内容类型。

我担心的是，谷歌和其他搜索引擎会将其他帖子视为重复内容，因为每个帖子都会出现在多个页面上。

在我的另一个博客上 - http://coding.pressbin.com - 我曾尝试解决这个问题通过将早期的帖子作为 AJAX 调用加载，但我想知道是否有更简单的方法。

有没有什么方法可以向搜索引擎发出信号，表明页面的特定部分不应被编入索引？

如果没有，是否有比 AJAX 调用更简单的方法来完成我想做的事情？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浪菊怪哟 2024-11-24 15:52:51

警告：这尚未经过实际测试，但根据我对 Google 网站站长中心博客和 schema.org 文档的阅读，应该可以正常工作。无论如何......

这似乎是使用微数据构建内容的一个很好的用例。这涉及将您的内容标记为以下类型的丰富网页摘要文章，如下所示：

   <div itemscope itemtype="http://schema.org/Article" class="item first">
      <h3 itemprop="name">August 13's correlation</h3>        
      <p itemprop="description" class="stat">In general, 27 percent of people have never had any wisdom teeth extracted. But among those who describe themselves as pessimists, 38 percent haven't had wisdom teeth extracted.</p>
      <p class="info">Based on a survey of 222 people who haven't had wisdom teeth extracted and 576 people in general.</p>
      <p class="social"><a itemprop="url" href="http://www.correlated.org/153">Link to this statistic</a></p>  
   </div>

注意itemscope的使用， itemtype 和 itemprop 定义页面上的每篇文章。

现在，根据 Google、Yahoo 和 Bing 支持的 schema.org，搜索引擎应遵循上面 itemprop="url" 描述的规范 URL：

规范参考

通常，链接是使用元素指定的。例如，
以下 HTML 链接指向《Catcher in the》一书的 Wikipedia 页面
黑麦。
  麦田里的守望者—
  作者：JD Salinger
  这是这本书的 
href="http://en.wikipedia.org/wiki/The_Catcher_in_the_Rye">维基百科页。

http://schema.org/docs/gs.html#advanced_enum

所以当标记时通过这种方式，Google 应该能够正确地将哪段内容属于哪个规范 URL，并相应地在 SERP 中对其进行加权。

完成内容标记后，您可以使用 Rich Snippets 测试工具，在您将网页投入生产之前，它应该可以让您很好地了解 Google 对您的网页的看法。

PS，为避免重复内容惩罚，您可以做的最重要事情是修复永久链接页面上的标题。目前，他们都阅读“相关 - 发现令人惊讶的相关性”，这将导致您的排名受到巨大打击。

Caveat: this hasn't been tested in the wild, but should work based on my reading of the Google Webmaster Central blog and the schema.org docs. Anyway...

This seems like a good use case for structuring your content using microdata. This involves marking up your content as a Rich Snippet of the type Article, like so:

   <div itemscope itemtype="http://schema.org/Article" class="item first">
      <h3 itemprop="name">August 13's correlation</h3>        
      <p itemprop="description" class="stat">In general, 27 percent of people have never had any wisdom teeth extracted. But among those who describe themselves as pessimists, 38 percent haven't had wisdom teeth extracted.</p>
      <p class="info">Based on a survey of 222 people who haven't had wisdom teeth extracted and 576 people in general.</p>
      <p class="social"><a itemprop="url" href="http://www.correlated.org/153">Link to this statistic</a></p>  
   </div>

Note the use of itemscope, itemtype and itemprop to define each article on the page.

Now, according to schema.org, which is supported by Google, Yahoo and Bing, the search engines should respect the canonical url described by the itemprop="url" above:

Canonical references

Typically, links are specified using the element. For example, the
following HTML links to the Wikipedia page for the book Catcher in the
Rye.
<div itemscope itemtype="http://schema.org/Book">
  <span itemprop="name">The Catcher in the Rye</span>—
  by <span itemprop="author">J.D. Salinger</a>
  Here is the book's <a itemprop="url"
href="http://en.wikipedia.org/wiki/The_Catcher_in_the_Rye">Wikipedia
page.

http://schema.org/docs/gs.html#advanced_enum

So when marked up in this way, Google should be able to correctly ascribe which piece of content belongs to which canonical URL and weight it in the SERPs accordingly.

Once you've done marking up your content, you can test it using the Rich Snippets testing tool, which should give you a good indication of what Google things about your pages before you roll it into production.

p.s. the most important thing you can do to avoid a duplicate content penalty is to fix the titles on your permalink pages. Currently they all read 'Correlated - Discover surprising correlations' which will cause your ranking to take a massive hit.

回复收藏 0 原文

等待我真够勒 2024-11-24 15:52:51

恐怕但我认为不可能告诉搜索引擎您的网页的特定区域不应该被索引（例如 HTML 源中的 div）。解决方案是使用 Iframe 来存储您不需要搜索引擎索引的内容，因此我将使用带有适当标签 Disallow 的 robots.text 文件来拒绝访问链接到 Iframe 的特定文件。

回复收藏 0 原文

烟柳画桥 2024-11-24 15:52:51

您不能告诉 Google 忽略网页的某些部分，但您可以以搜索引擎无法找到的方式提供该内容。您可以将该内容放置在 </code> 中或通过 JavaScript 提供。

我不喜欢这两种方法，因为它们很糟糕。最好的选择是完全阻止搜索引擎访问这些页面，因为无论如何所有内容都是重复的。您可以通过以下几种方式来实现此目的：

使用 robots.txt 阻止您的存档。如果您的档案位于其自己的目录中，那么您可以轻松阻止整个目录。您还可以阻止单个文件并使用通配符来匹配模式。
使用标记阻止每个页面被索引。
使用 X-Robots-Tag: noindex HTTP 标头阻止每个页面被搜索引擎索引。这实际上与使用 ` 标记相同，尽管这个标记更容易实现，因为您可以在 .htaccess 文件中使用它并将其应用到整个目录。

回复收藏 0 原文

~没有更多了~

关于作者

开始看清了

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

避免 Google 上的存档页面出现重复内容？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

游缘惊梦

小兔几

Glik

生生漫

Luxian

Champion-Ming

友情链接

避免 Google 上的存档页面出现重复内容？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

游缘惊梦

小兔几

Glik

生生漫

Luxian

Champion-Ming

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。