那些具有“iframe=true&width=80%&height=80%”的请求是怎么回事？查询参数？

发布于 2025-01-04 03:13:09 字数 418 浏览 3 评论 0原文

我正在运行 Rails 3.2 应用程序。我检查了 Google 网站管理员工具，发现随机页面有很多 HTTP 502 错误。奇怪的是，所有这些都以 ?iframe=true&width=80%&height=80% 作为查询参数进行爬网：

例如 http://www.mypage.com/anypage?iframe=true&width=80%&height=80%

当然，我不会在内部链接到这些页面，必须是外部的。检查谷歌，在这里证明了我——我看到很多其他页面也有同样的问题。

似乎外部服务创建了这些链接，但为什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

若沐 2025-01-11 03:13:09

我也看到这些。在过去 24 小时内，我的其中一个页面有 9 次点击。它们都来自同一个 IP 地址，即 Google 在山景城的 IP 地址。他们都没有推荐人。另外，一个非常有趣的事情是，其中一半具有这样的标头：

HTTP_ACCEPT           : */*
HTTP_ACCEPT_ENCODING  : gzip,deflate
HTTP_CONNECTION       : Keep-alive
HTTP_FROM             : googlebot(at)googlebot.com
HTTP_HOST             : mydomain.com
HTTP_USER_AGENT       : Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

但随后散布着来自同一 IP 的请求，这些请求没有在异常中报告任何 HTTP 标头。我不确定这是否意味着它们没有被发送，或者 Rails 堆栈中的某些内容是否由于请求中的其他变化而阻止记录头。无论如何，请求都是分散的。

该页面只存在了大约一个月，根据 GA 的说法，在此期间只收到了 5 个请求。

所有这些让我相信 Google 内部有人正在做一些实验性的事情，这导致了这些有缺陷的查询字符串编码，Rails 应用程序正在看到它，因为它碰巧使机架 QS 解析器崩溃，而其他平台可能更宽容。

与此同时，我可能会猴子补丁架只是为了停止对我大喊大叫，但关于正在发生的事情的最终答案必须来自谷歌（那里有人吗？）。

I'm seeing these too. Over the past 24 hours I have 9 hits on one of my pages. They all come from the same IP address, which is Google's in Mountain View. None of them have a referrer. Also, a really interesting thing is that half of them have headers like this:

HTTP_ACCEPT           : */*
HTTP_ACCEPT_ENCODING  : gzip,deflate
HTTP_CONNECTION       : Keep-alive
HTTP_FROM             : googlebot(at)googlebot.com
HTTP_HOST             : mydomain.com
HTTP_USER_AGENT       : Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

But then interspersed are requests from the same IP that don't have any HTTP headers reported in the exception. I'm not sure if this means they aren't being sent, or if something in the Rails stack is preventing the headers from getting recorded due to some other variation in the requests. In any case the requests are interspersed.

The page in question has existed for only about a month, and it's only seen 5 requests during that time according to GA.

All this leads me to believe that someone inside Google is doing something experimental which is leading to these buggy query string encodings, and Rails apps are seeing it because it happens to crash the rack QS parser, whereas other platforms may be more forgiving.

In the meantime I may monkey patch rack just to stop shouting at me, but the ultimate answer about what's going on will have to come from Google (anyone there?).

回复收藏 0 原文

如何视而不见 2025-01-11 03:13:09

您可以将其添加到初始化程序中以消除错误（使用 Ruby 1.8.x）：

module URI

  major, minor, patch = RUBY_VERSION.split('.').map { |v| v.to_i }

  if major == 1 && minor < 9
    def self.decode_www_form_component(str, enc=nil)
      if TBLDECWWWCOMP_.empty?
        tbl = {}
        256.times do |i|
          h, l = i>>4, i&15
          tbl['%%%X%X' % [h, l]] = i.chr
          tbl['%%%x%X' % [h, l]] = i.chr
          tbl['%%%X%x' % [h, l]] = i.chr
          tbl['%%%x%x' % [h, l]] = i.chr
        end
        tbl['+'] = ' '
        begin
          TBLDECWWWCOMP_.replace(tbl)
          TBLDECWWWCOMP_.freeze
        rescue
        end
      end
      str = str.gsub(/%(?![0-9a-fA-F]{2})/, "%25")
      str.gsub(/\+|%[0-9a-fA-F]{2}/) {|m| TBLDECWWWCOMP_[m]}
    end
  end

end

所有这一切都是对后面没有两个字符的 % 符号进行编码，而不是引发异常。不过，我不确定给机架打猴子补丁是个好主意。必须有一个有效的原因没有在 gem 中完成此操作（也许与安全相关？）。

You can add this to your initializers to get rid of the errors (with Ruby 1.8.x):

module URI

  major, minor, patch = RUBY_VERSION.split('.').map { |v| v.to_i }

  if major == 1 && minor < 9
    def self.decode_www_form_component(str, enc=nil)
      if TBLDECWWWCOMP_.empty?
        tbl = {}
        256.times do |i|
          h, l = i>>4, i&15
          tbl['%%%X%X' % [h, l]] = i.chr
          tbl['%%%x%X' % [h, l]] = i.chr
          tbl['%%%X%x' % [h, l]] = i.chr
          tbl['%%%x%x' % [h, l]] = i.chr
        end
        tbl['+'] = ' '
        begin
          TBLDECWWWCOMP_.replace(tbl)
          TBLDECWWWCOMP_.freeze
        rescue
        end
      end
      str = str.gsub(/%(?![0-9a-fA-F]{2})/, "%25")
      str.gsub(/\+|%[0-9a-fA-F]{2}/) {|m| TBLDECWWWCOMP_[m]}
    end
  end

end

All this does is encode % symbols that aren't followed by two characters instead of raising an exception. Not sure it's such a good idea to be monkeypatching rack, though. There must be a valid reason this wasn't done in the gem (maybe security related?).

回复收藏 0 原文

七度光 2025-01-11 03:13:09

我刚刚了解到有关这个问题的更多信息。根据谷歌网络管理员的说法，所有链接似乎都来自spidername.com。看起来他们将其添加到 url 中，并且当您单击它时会以某种方式使用 iframe 来显示内容。可能使用 javascript 来查看 url 是否包含 iframe= 查询参数。然而，google bot 会直接进入 iframe。这就是导致问题的原因。

我决定在 nginx 中使用重定向规则来解决这个问题。

回复收藏 0 原文