Indy - IdHttp 如何处理页面重定向?
使用:Delphi 2010,最新版本的 Indy
我正在尝试从 Google Adsense 网页上抓取数据,目的是获取报告。然而到目前为止我还没有成功。它在第一个请求后停止并且不再继续。
使用 Fiddler 调试 Google Adsense 网站的流量/请求,并使用网络浏览器加载 Adsense 页面,我可以看到请求(来自网络浏览器)生成了许多重定向,直到页面加载。
但是,我的 Delphi 应用程序在停止之前仅生成几个请求。
以下是我遵循的步骤:
- 在表单上放置 IdHTTP 和 IdSSLIOHandlerSocketOpenSSL1 组件。
- 将 IdHTTP 组件属性AllowCookies 和HandleRedirects 设置为True,并将IOHandler 属性设置为IdSSLIOHandlerSocketOpenSSL1。
- 设置 IdSSLIOHandlerSocketOpenSSL1 组件属性 Method := 'sslvSSLv23'
最后我有以下代码:
procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
Output : TMemoryStream;
begin
Output := TMemoryStream.Create;
try
IdHTTP1.Get(FURL, Output);
Output.SaveToFile(AFile);
finally
Output.Free;
end;
end;
但是,它没有按预期到达登录页面。我希望它的行为就像一个网络浏览器一样,并继续进行重定向,直到找到最终页面。
这是 Fiddler 标头的输出:
HTTP/1.1 302 已找到 地点:https://crypted.google.com/ 缓存控制:私有 内容类型:text/html;字符集=UTF-8 设置 Cookie:PREF=ID=5166063f01b64b03:FF=0:TM=1293571783:LM=1293571783:S=a5OtsOqxu_GiV3d6;过期=2012 年 12 月 27 日星期四 21:29:43 GMT;路径=/;域名=.google.com 设置 Cookie:NID=42=XFUwZdkyF0TJKmoJjqoGgYNtGyOz-Irvz7ivao2z0--pCBKPpAvCGUeaa5GXLneP41wlpse-yU5UuC57pBfMkv434t7XB1H68ET0ZgVDNEPNmIVEQRVj7AA1Lnvv2Aez;过期=2011 年 6 月 29 日星期三 21:29:43 GMT;路径=/;域名=.google.com;仅HTTP 日期:2010 年 12 月 28 日星期二 21:29:43 GMT 服务器:gws 内容长度:226 X-XSS-保护:1;模式=块
首先,这个输出有什么问题吗?
我还应该做些什么来让 IdHTTP 组件继续执行重定向直到最终页面吗?
Using: Delphi 2010, latest version of Indy
I am trying to scrape the data off Googles Adsense web page, with an aim to get the reports. However I have been unsuccessful so far. It stops after the first request and does not proceed.
Using Fiddler to debug the traffic/requests to Google Adsense website, and a web browser to load the Adsense page, I can see that the request (from the webbrowser) generates a number of redirects until the page is loaded.
However, my Delphi application is only generating a couple of requests before it stops.
Here are the steps I have followed:
- Drop a IdHTTP and a IdSSLIOHandlerSocketOpenSSL1 component on the form.
- Set the IdHTTP component properties AllowCookies and HandleRedirects to True, and IOHandler property to the IdSSLIOHandlerSocketOpenSSL1.
- Set the IdSSLIOHandlerSocketOpenSSL1 component property Method := 'sslvSSLv23'
Finally I have this code:
procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
Output : TMemoryStream;
begin
Output := TMemoryStream.Create;
try
IdHTTP1.Get(FURL, Output);
Output.SaveToFile(AFile);
finally
Output.Free;
end;
end;
However, it does not get to the login page as expected. I would expect it to behave as if it was a webbrowser and proceed through the redirects until it finds the final page.
This is the output of the headers from Fiddler:
HTTP/1.1 302 Found Location: https://encrypted.google.com/ Cache-Control: private Content-Type: text/html; charset=UTF-8 Set-Cookie: PREF=ID=5166063f01b64b03:FF=0:TM=1293571783:LM=1293571783:S=a5OtsOqxu_GiV3d6; expires=Thu, 27-Dec-2012 21:29:43 GMT; path=/; domain=.google.com Set-Cookie: NID=42=XFUwZdkyF0TJKmoJjqoGgYNtGyOz-Irvz7ivao2z0--pCBKPpAvCGUeaa5GXLneP41wlpse-yU5UuC57pBfMkv434t7XB1H68ET0ZgVDNEPNmIVEQRVj7AA1Lnvv2Aez; expires=Wed, 29-Jun-2011 21:29:43 GMT; path=/; domain=.google.com; HttpOnly Date: Tue, 28 Dec 2010 21:29:43 GMT Server: gws Content-Length: 226 X-XSS-Protection: 1; mode=block
Firstly, is there anything wrong with this output?
Is there something more that I should do to get the IdHTTP component to keep pursuing the redirects until the final page?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
进行调用之前的 IdHTTP 组件属性值:
重定向事件处理程序:
进行调用:
以下是 Fiddler 的(请求和响应标头)输出:
IdHTTP component property values prior to making the call:
Redirect event handler:
Making the call:
Here's the (request and response headers) output from Fiddler:
获取重定向
TIdHTTP.HandleRedirects := True
,以便它开始自动处理重定向。TIdHTTP.RedirectMaximum
用于设置应处理多少个连续重定向。或者,您可以分配
TIdHTTP.OnRedirect
并从该处理程序设置Handled := True
。这就是我在一个需要从维基媒体网站(我自己的网站)读取数据的项目中所做的事情。关于 HTTP 响应
该响应没有任何问题,它是一个非常基本的重定向到 https://encrypted.google.com/。 TIdHTTP 应转到给定页面作为响应。它还设置一些cookie。
其他建议
不要忘记分配
CookieManager
并确保对所有后续请求使用相同的CookieManager
。如果不这样做,您可能会一次又一次地被重定向到登录页面。Getting redirects going
TIdHTTP.HandleRedirects := True
so it starts automatically handling redirects.TIdHTTP.RedirectMaximum
is used to set how many successive redirects should be handled.Alternatively you may assign
TIdHTTP.OnRedirect
and setHandled := True
from that handler. This is what I'm doing in a project that needs to read data from a WikiMedia web site (my own site).About the HTTP response
Nothing wrong with that response, it's a very basic redirect to https://encrypted.google.com/. TIdHTTP should go to the given page in response. It also sets some cookies.
Other suggestions
Don't forget to assign an
CookieManager
and make sure you use the sameCookieManager
for all subsequent requests. If you don't you'll probably get redirected to the login page over and over again.就我而言,我需要修复 dest,因为不知何故我有;在里面!
In my case I needed to fix dest, because somehow I had ; in it!