将 RSS 提要作为站点地图提交给 Google?

发布于 2024-09-14 21:54:57 字数 551 浏览 2 评论 0原文

背景

我在一家在线媒体公司工作,该公司拥有一个拥有超过 75K 页面的新闻网站。我们目前使用 Google Sitemap Generator(安装在我们的服务器上)为我们的网站构建动态 XML 站点地图。事实上,由于我们有大量内容,因此我们使用站点地图的站点地图。 (Google 最多只允许 50K 个网址。)

问题

站点地图每 12 小时生成一次,并由用户行为驱动。也就是说,它解析服务器日志文件并查看哪些页面被获取最多,并基于此构建站点地图。

由于我们无法保证将新页面添加到站点地图中,因此以 RSS 源的形式提交站点地图是否更好?这样,每次我们的编辑之一创建新页面(或文章)时,它都会添加到提要中并提交给谷歌。这会带来向 Google 推送重复内容的问题,因为站点地图和 RSS 提要可能包含相同的 URL。谷歌会因为重复内容而惩罚我们吗?其他内容丰富或媒体网站如何通知谷歌他们正在发布新内容?

我知道谷歌机器人只索引它认为重要和相关的页面,但如果至少抓取我们发布的任何新文章,那就太好了。

任何帮助将不胜感激。

Background

I work for an online media company that hosts a news site with over 75K pages. We currently use Google Sitemap Generator (installed on our server) to build dynamic XML sitemaps for our site. In fact since we have a ton of content, we use a sitemap of sitemaps. (Google only allows a maximum of 50K URLs.)

Problem

The sitemaps are generated every 12 hours and is driven by user behavior. That is, it parses the server log file and sees which pages are being fetched the most and builds the sitemap based on that.

Since we cannot guarantee that NEW pages are being added to the sitemap, is it better to submit a sitemap as an RSS feed? In that way, everytime one of our editors creates a new page (or article) it is added to the feed and submitted to google. And this brings up the issue of pushing duplicate content to google as the sitemap and the RSS feed might contain the same urls. Will google penalize us for duplicate content? How do other content-rich or media sites notify google that they are posting new content?

I understand that googlebots only index pages that it deems important and relevant, but it would be great if atleast crawled any new article that we post.

Any help would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

娇妻 2024-09-21 21:54:57

为什么不简单地在站点地图中包含每个页面? 75k 页并不是一个巨大的数字,很多网站都有几个站点地图,总计数百万页,Google 会消化所有这些(尽管 Google 只会索引那些它认为重要的内容,正如您所指出的)。

一种适合您的技术是根据发布日期将站点地图拆分为新内容和已存档内容 - 例如,前 7 天的所有内容都使用单个站点地图,而其余内容则根据需要拆分为其他站点地图文件,这可能有助于快速将您最新的内容编入索引。

回到关于 RSS Feed 站点地图的问题 - 不要担心重复的内容,因为这对于站点地图来说不是问题。如果您在网站上多次发布同一篇文章,重复的内容才会成为问题 - 站点地图和 RSS 提要只是内容的链接,而不是内容本身,因此如果 RSS 提要是最简单的方法向 Google 报告您的新鲜内容,就去做吧。

Why not simply have every page in your sitemap? 75k pages isn't a huge number, plenty of sites have several sitemaps totalling millions of pages and Google will digest them all (although Google will only index those it deems important as you pointed out).

One technique for you would be to split the sitemaps up into New and Archived content based on the publication date - such as a single sitemap for all content from the previous 7 days and the rest of the content split into other sitemap files as appropriate, this may help to get your freshest content indexed quickly.

Back to your question about an RSS Feed sitemap - don't worry about duplicate content as this is not an issue when it comes to sitemaps. Duplicate content is only a problem if you published the same article several times on the site - sitemaps and RSS feeds are only links to the content, not the content itself, so if a RSS feed is the easiest way of reporting your fresh content to Google, go for it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文