那些具有“iframe=true&width=80%&height=80%”的请求是怎么回事?查询参数?

发布于 2025-01-04 03:13:09 字数 418 浏览 1 评论 0原文

我正在运行 Rails 3.2 应用程序。我检查了 Google 网站管理员工具,发现随机页面有很多 HTTP 502 错误。奇怪的是,所有这些都以 ?iframe=true&width=80%&height=80% 作为查询参数进行爬网:

例如 http://www.mypage.com/anypage?iframe=true&width=80%&height=80%

当然,我不会在内部链接到这些页面,必须是外部的。检查谷歌,在这里证明了我——我看到很多其他页面也有同样的问题。

似乎外部服务创建了这些链接,但为什么?

I'm running a Rails 3.2 App. I checked Google Webmaster tools and saw lot's of HTTP 502 errors for random pages. Weird thing is that all of them where crawled with ?iframe=true&width=80%&height=80% as query param:

e.g. http://www.mypage.com/anypage?iframe=true&width=80%&height=80%

For sure I dont link like that to those pages internally, must be external. Checking Google, proofs me here - I see lot's of other pages having same issues.

Seems like an external service creates those links, but why??

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

若沐 2025-01-11 03:13:09

我也看到这些。在过去 24 小时内,我的其中一个页面有 9 次点击。它们都来自同一个 IP 地址,即 Google 在山景城的 IP 地址。他们都没有推荐人。另外,一个非常有趣的事情是,其中一半具有这样的标头:

HTTP_ACCEPT           : */*
HTTP_ACCEPT_ENCODING  : gzip,deflate
HTTP_CONNECTION       : Keep-alive
HTTP_FROM             : googlebot(at)googlebot.com
HTTP_HOST             : mydomain.com
HTTP_USER_AGENT       : Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

但随后散布着来自同一 IP 的请求,这些请求没有在异常中报告任何 HTTP 标头。我不确定这是否意味着它们没有被发送,或者 Rails 堆栈中的某些内容是否由于请求中的其他变化而阻止记录头。无论如何,请求都是分散的。

该页面只存在了大约一个月,根据 GA 的说法,在此期间只收到了 5 个请求。

所有这些让我相信 Google 内部有人正在做一些实验性的事情,这导致了这些有缺陷的查询字符串编码,Rails 应用程序正在看到它,因为它碰巧使机架 QS 解析器崩溃,而其他平台可能更宽容。

与此同时,我可能会猴子补丁架只是为了停止对我大喊大叫,但关于正在发生的事情的最终答案必须来自谷歌(那里有人吗?)。

I'm seeing these too. Over the past 24 hours I have 9 hits on one of my pages. They all come from the same IP address, which is Google's in Mountain View. None of them have a referrer. Also, a really interesting thing is that half of them have headers like this:

HTTP_ACCEPT           : */*
HTTP_ACCEPT_ENCODING  : gzip,deflate
HTTP_CONNECTION       : Keep-alive
HTTP_FROM             : googlebot(at)googlebot.com
HTTP_HOST             : mydomain.com
HTTP_USER_AGENT       : Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

But then interspersed are requests from the same IP that don't have any HTTP headers reported in the exception. I'm not sure if this means they aren't being sent, or if something in the Rails stack is preventing the headers from getting recorded due to some other variation in the requests. In any case the requests are interspersed.

The page in question has existed for only about a month, and it's only seen 5 requests during that time according to GA.

All this leads me to believe that someone inside Google is doing something experimental which is leading to these buggy query string encodings, and Rails apps are seeing it because it happens to crash the rack QS parser, whereas other platforms may be more forgiving.

In the meantime I may monkey patch rack just to stop shouting at me, but the ultimate answer about what's going on will have to come from Google (anyone there?).

如何视而不见 2025-01-11 03:13:09

您可以将其添加到初始化程序中以消除错误(使用 Ruby 1.8.x):

module URI

  major, minor, patch = RUBY_VERSION.split('.').map { |v| v.to_i }

  if major == 1 && minor < 9
    def self.decode_www_form_component(str, enc=nil)
      if TBLDECWWWCOMP_.empty?
        tbl = {}
        256.times do |i|
          h, l = i>>4, i&15
          tbl['%%%X%X' % [h, l]] = i.chr
          tbl['%%%x%X' % [h, l]] = i.chr
          tbl['%%%X%x' % [h, l]] = i.chr
          tbl['%%%x%x' % [h, l]] = i.chr
        end
        tbl['+'] = ' '
        begin
          TBLDECWWWCOMP_.replace(tbl)
          TBLDECWWWCOMP_.freeze
        rescue
        end
      end
      str = str.gsub(/%(?![0-9a-fA-F]{2})/, "%25")
      str.gsub(/\+|%[0-9a-fA-F]{2}/) {|m| TBLDECWWWCOMP_[m]}
    end
  end

end

所有这一切都是对后面没有两个字符的 % 符号进行编码,而不是引发异常。不过,我不确定给机架打猴子补丁是个好主意。必须有一个有效的原因没有在 gem 中完成此操作(也许与安全相关?)。

You can add this to your initializers to get rid of the errors (with Ruby 1.8.x):

module URI

  major, minor, patch = RUBY_VERSION.split('.').map { |v| v.to_i }

  if major == 1 && minor < 9
    def self.decode_www_form_component(str, enc=nil)
      if TBLDECWWWCOMP_.empty?
        tbl = {}
        256.times do |i|
          h, l = i>>4, i&15
          tbl['%%%X%X' % [h, l]] = i.chr
          tbl['%%%x%X' % [h, l]] = i.chr
          tbl['%%%X%x' % [h, l]] = i.chr
          tbl['%%%x%x' % [h, l]] = i.chr
        end
        tbl['+'] = ' '
        begin
          TBLDECWWWCOMP_.replace(tbl)
          TBLDECWWWCOMP_.freeze
        rescue
        end
      end
      str = str.gsub(/%(?![0-9a-fA-F]{2})/, "%25")
      str.gsub(/\+|%[0-9a-fA-F]{2}/) {|m| TBLDECWWWCOMP_[m]}
    end
  end

end

All this does is encode % symbols that aren't followed by two characters instead of raising an exception. Not sure it's such a good idea to be monkeypatching rack, though. There must be a valid reason this wasn't done in the gem (maybe security related?).

七度光 2025-01-11 03:13:09

我刚刚了解到有关这个问题的更多信息。根据谷歌网络管理员的说法,所有链接似乎都来自spidername.com。看起来他们将其添加到 url 中,并且当您单击它时会以某种方式使用 iframe 来显示内容。可能使用 javascript 来查看 url 是否包含 iframe= 查询参数。然而,google bot 会直接进入 iframe。这就是导致问题的原因。

我决定在 nginx 中使用重定向规则来解决这个问题。

I just found out more about this issue. It looks like all the links are coming from spidername.com according to google web master. It looks like they add that to the url and somehow when you click on it will use an iframe to show the content. Probably using javascript to see if the url contain the iframe= query param. However, google bot is going straight to the iframe. That is causing the issue.

I decide to use a redirect rule in nginx to solve the issue.

热鲨 2025-01-11 03:13:09

我有同样的问题。我担心第三方垃圾邮件链接试图降低我网站的谷歌排名。

I have the same issue. I am worried that it is third party spam link that tries to lower my site's google ranking.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文