Indy - IdHttp 如何处理页面重定向?

发布于 2024-10-09 11:50:40 字数 1491 浏览 0 评论 0原文

使用:Delphi 2010,最新版本的 Indy

我正在尝试从 Google Adsense 网页上抓取数据,目的是获取报告。然而到目前为止我还没有成功。它在第一个请求后停止并且不再继续。

使用 Fiddler 调试 Google Adsense 网站的流量/请求,并使用网络浏览器加载 Adsense 页面,我可以看到请求(来自网络浏览器)生成了许多重定向,直到页面加载。

但是,我的 Delphi 应用程序在停止之前仅生成几个请求。

以下是我遵循的步骤:

  1. 在表单上放置 IdHTTP 和 IdSSLIOHandlerSocketOpenSSL1 组件。
  2. 将 IdHTTP 组件属性AllowCookies 和HandleRedirects 设置为True,并将IOHandler 属性设置为IdSSLIOHandlerSocketOpenSSL1。
  3. 设置 IdSSLIOHandlerSocketOpenSSL1 组件属性 Method := 'sslvSSLv23'

最后我有以下代码:

procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
 Output : TMemoryStream;
begin
  Output := TMemoryStream.Create;
  try
    IdHTTP1.Get(FURL, Output);
    Output.SaveToFile(AFile);
  finally
    Output.Free;
  end;
end;

但是,它没有按预期到达登录页面。我希望它的行为就像一个网络浏览器一样,并继续进行重定向,直到找到最终页面。

这是 Fiddler 标头的输出:

HTTP/1.1 302 已找到
地点:https://crypted.google.com/
缓存控制:私有
内容类型:text/html;字符集=UTF-8
设置 Cookie:PREF=ID=5166063f01b64b03:FF=0:TM=1293571783:LM=1293571783:S=a5OtsOqxu_GiV3d6;过期=2012 年 12 月 27 日星期四 21:29:43 GMT;路径=/;域名=.google.com
设置 Cookie:NID=42=XFUwZdkyF0TJKmoJjqoGgYNtGyOz-Irvz7ivao2z0--pCBKPpAvCGUeaa5GXLneP41wlpse-yU5UuC57pBfMkv434t7XB1H68ET0ZgVDNEPNmIVEQRVj7AA1Lnvv2Aez;过期=2011 年 6 月 29 日星期三 21:29:43 GMT;路径=/;域名=.google.com;仅HTTP
日期:2010 年 12 月 28 日星期二 21:29:43 GMT
服务器:gws
内容长度:226
X-XSS-保护:1;模式=块

首先,这个输出有什么问题吗?

我还应该做些什么来让 IdHTTP 组件继续执行重定向直到最终页面吗?

Using: Delphi 2010, latest version of Indy

I am trying to scrape the data off Googles Adsense web page, with an aim to get the reports. However I have been unsuccessful so far. It stops after the first request and does not proceed.

Using Fiddler to debug the traffic/requests to Google Adsense website, and a web browser to load the Adsense page, I can see that the request (from the webbrowser) generates a number of redirects until the page is loaded.

However, my Delphi application is only generating a couple of requests before it stops.

Here are the steps I have followed:

  1. Drop a IdHTTP and a IdSSLIOHandlerSocketOpenSSL1 component on the form.
  2. Set the IdHTTP component properties AllowCookies and HandleRedirects to True, and IOHandler property to the IdSSLIOHandlerSocketOpenSSL1.
  3. Set the IdSSLIOHandlerSocketOpenSSL1 component property Method := 'sslvSSLv23'

Finally I have this code:

procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
 Output : TMemoryStream;
begin
  Output := TMemoryStream.Create;
  try
    IdHTTP1.Get(FURL, Output);
    Output.SaveToFile(AFile);
  finally
    Output.Free;
  end;
end;

However, it does not get to the login page as expected. I would expect it to behave as if it was a webbrowser and proceed through the redirects until it finds the final page.

This is the output of the headers from Fiddler:

HTTP/1.1 302 Found
Location: https://encrypted.google.com/
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=5166063f01b64b03:FF=0:TM=1293571783:LM=1293571783:S=a5OtsOqxu_GiV3d6; expires=Thu, 27-Dec-2012 21:29:43 GMT; path=/; domain=.google.com
Set-Cookie: NID=42=XFUwZdkyF0TJKmoJjqoGgYNtGyOz-Irvz7ivao2z0--pCBKPpAvCGUeaa5GXLneP41wlpse-yU5UuC57pBfMkv434t7XB1H68ET0ZgVDNEPNmIVEQRVj7AA1Lnvv2Aez; expires=Wed, 29-Jun-2011 21:29:43 GMT; path=/; domain=.google.com; HttpOnly
Date: Tue, 28 Dec 2010 21:29:43 GMT
Server: gws
Content-Length: 226
X-XSS-Protection: 1; mode=block

Firstly, is there anything wrong with this output?

Is there something more that I should do to get the IdHTTP component to keep pursuing the redirects until the final page?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

深海夜未眠 2024-10-16 11:50:40

进行调用之前的 IdHTTP 组件属性值:

    Name := 'IdHTTP1';
    IOHandler := IdSSLIOHandlerSocketOpenSSL1;
    AllowCookies := True;
    HandleRedirects := True;
    RedirectMaximum := 35;
    Request.UserAgent := 
      'Mozilla/5.0 (Windows NT 5.1; rv:2.0b8) Gecko/20100101 Firefox/4.' +
      '0b8';
    HTTPOptions := [hoForceEncodeParams];
    OnRedirect := IdHTTP1Redirect;
    CookieManager := IdCookieManager1;

重定向事件处理程序:

procedure TfmMain.IdHTTP1Redirect(Sender: TObject; var dest: string; var
    NumRedirect: Integer; var Handled: Boolean; var VMethod: string);
begin
   Handled := True;
end;

进行调用:

  FURL := 'https://www.google.com';

  GetUrlToFile( (FURL + '/adsense/'), 'a.html');




  procedure TfmMain.GetUrlToFile(AURL, AFile : String);
  var
   Output : TMemoryStream;
  begin
    Output := TMemoryStream.Create;
    try
      try
       IdHTTP1.Get(AURL, Output);
       IdHTTP1.Disconnect;
      except

      end;
      Output.SaveToFile(AFile);
    finally
      Output.Free;
    end;
  end;

以下是 Fiddler 的(请求和响应标头)输出:

替代文本

IdHTTP component property values prior to making the call:

    Name := 'IdHTTP1';
    IOHandler := IdSSLIOHandlerSocketOpenSSL1;
    AllowCookies := True;
    HandleRedirects := True;
    RedirectMaximum := 35;
    Request.UserAgent := 
      'Mozilla/5.0 (Windows NT 5.1; rv:2.0b8) Gecko/20100101 Firefox/4.' +
      '0b8';
    HTTPOptions := [hoForceEncodeParams];
    OnRedirect := IdHTTP1Redirect;
    CookieManager := IdCookieManager1;

Redirect event handler:

procedure TfmMain.IdHTTP1Redirect(Sender: TObject; var dest: string; var
    NumRedirect: Integer; var Handled: Boolean; var VMethod: string);
begin
   Handled := True;
end;

Making the call:

  FURL := 'https://www.google.com';

  GetUrlToFile( (FURL + '/adsense/'), 'a.html');




  procedure TfmMain.GetUrlToFile(AURL, AFile : String);
  var
   Output : TMemoryStream;
  begin
    Output := TMemoryStream.Create;
    try
      try
       IdHTTP1.Get(AURL, Output);
       IdHTTP1.Disconnect;
      except

      end;
      Output.SaveToFile(AFile);
    finally
      Output.Free;
    end;
  end;

Here's the (request and response headers) output from Fiddler:

alt text

丑疤怪 2024-10-16 11:50:40

获取重定向

TIdHTTP.HandleRedirects := True,以便它开始自动处理重定向。

TIdHTTP.RedirectMaximum 用于设置应处理多少个连续重定向。


或者,您可以分配 TIdHTTP.OnRedirect 并从该处理程序设置 Handled := True。这就是我在一个需要从维基媒体网站(我自己的网站)读取数据的项目中所做的事情。

关于 HTTP 响应

该响应没有任何问题,它是一个非常基本的重定向到 https://encrypted.google.com/。 TIdHTTP 应转到给定页面作为响应。它还设置一些cookie。

其他建议

不要忘记分配 CookieManager 并确保对所有后续请求使用相同的 CookieManager。如果不这样做,您可能会一次又一次地被重定向到登录页面。

Getting redirects going

TIdHTTP.HandleRedirects := True so it starts automatically handling redirects.

TIdHTTP.RedirectMaximum is used to set how many successive redirects should be handled.


Alternatively you may assign TIdHTTP.OnRedirect and set Handled := True from that handler. This is what I'm doing in a project that needs to read data from a WikiMedia web site (my own site).

About the HTTP response

Nothing wrong with that response, it's a very basic redirect to https://encrypted.google.com/. TIdHTTP should go to the given page in response. It also sets some cookies.

Other suggestions

Don't forget to assign an CookieManager and make sure you use the same CookieManager for all subsequent requests. If you don't you'll probably get redirected to the login page over and over again.

盛夏尉蓝 2024-10-16 11:50:40

就我而言,我需要修复 dest,因为不知何故我有;在里面!

procedure Tfrm1.IdHTTP1Redirect(Sender: TObject; var dest: string;
  var NumRedirect: Integer; var Handled: Boolean; var VMethod: string);
var
  i: Integer;
begin

  i := Pos(';', dest);
  if i > 0 then
  begin
    dest := Copy(dest,1, i - 1);
  end;

  Handled := True;
end;

In my case I needed to fix dest, because somehow I had ; in it!

procedure Tfrm1.IdHTTP1Redirect(Sender: TObject; var dest: string;
  var NumRedirect: Integer; var Handled: Boolean; var VMethod: string);
var
  i: Integer;
begin

  i := Pos(';', dest);
  if i > 0 then
  begin
    dest := Copy(dest,1, i - 1);
  end;

  Handled := True;
end;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文