像 SEOZZ 排名检查器这样的工具如何工作?

发布于 2024-10-21 20:11:47 字数 203 浏览 3 评论 0原文

似乎有很多工具可以让您检查网站在长关键字列表的搜索结果中的位置。我想在我正在从事的分析项目中集成这样的功能,但我想不出一种方法可以在不违反 Google TOS 并可能与他们的规则发生冲突的情况下以如此大的容量(每小时 1000 次)运行查询。自动查询检测系统(如果您的 IP 搜索量过高,则建立验证码)。

是否有其他方法来运行这些自动搜索,或者是抓取搜索结果页面的唯一方法?

It seems there are a number of tools that allow you to check a site's position in search results for long lists of keywords. I'd like to integrate a feature like that in an analytics project I'm working on, but I cannot think of a way to run queries at such high volumes (1000s per hour) without violating the Google TOS and potentially running afoul of their automatic query detection system (the one that institutes a CAPTCHA if search volume at your IP gets too high).

Is there an alternative method for running these automated searches, or is the only way forward to scrape search result pages?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

北风几吹夏 2024-10-28 20:11:47

如果您害怕 Google 的 TOS,请使用第三方来抓取它。

Use a third party to scrape it if you're scared of Google's TOS.

兰花执着 2024-10-28 20:11:47

Google 非常热衷于暂时禁止/阻止似乎正在发送自动查询的 IP 地址。是的,当然,这违反了他们的服务条款。

确切地知道他们如何检测它们也相当困难,但主要原因肯定是来自同一 IP 地址的相同关键字搜索。

简短的答案基本上是:获取大量代理

更多提示:

  • 不要搜索超出您需要的范围(例如前 10 页)
  • 对于同一关键字的查询之间等待大约 4-5 秒
  • 确保您使用真实的浏览器标头,而不是类似“CURL...”的内容
  • ,当您遇到障碍时,请停止使用 IP 进行抓取,并等待几天再使用相同的代理。
  • 尝试让您的程序像真正的用户一样运行,您不会遇到太多问题。

你可以很容易地抓取谷歌的内容,但要以非常大的数量进行抓取将是具有挑战性的。

Google is very hot on banning/blocking temporarily IP addresses that appear to be sending automated queries. And yes of course, this is against their TOS.

It's also quite difficult to know exactly how they are detecting them but the main reason is definitely identical keyword searches from the same IP address.

The short answer is basically: Get a lot of proxies

Some more tips:

  • Don't search further than you need to (e.g. the first 10 pages)
  • Wait around 4-5 seconds between queries for the same keyword
  • Make sure you use real browser headers and not something like "CURL..."
  • Stop scraping with an IP when you hit the road blocks and wait a few days before using the same proxy.
  • Try and make your program act like a real user would and you won't have too many issues.

You can scrape Google quite easily but to do it at a very high volume will be challenging.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文