使用 Indy 和 Indy 从受保护的网页获取 html 源代码时出现问题德尔福7
使用 Delphi 的 Indy10 组件,我正在获取网页的源代码并使用以下代码将其显示在 Memo 控件中。
procedure TForm1.Button1Click(Sender: TObject);
begin
Memo1.Text := IdHTTP1.Get(Edit1.Text);
end;
当我尝试显示的页面只是一个普通页面(即不需要登录)时,结果很好,但是如果我尝试获取需要登录的页面的源代码,那么结果是备忘录显示的源代码尽管我在 Firefox 和 IE 浏览器中都登录了该网站,但登录页面而不是我请求的页面。
所以我的问题是,如何使用 Indy 组件在网站上“验证”自己的身份,以便获得与登录后在浏览器中查看页面时获得的相同源?
谢谢, 道格拉斯
Using the Indy10 components for delphi, I am grabbing the source of a webpage and displaying it in a Memo control using the following code.
procedure TForm1.Button1Click(Sender: TObject);
begin
Memo1.Text := IdHTTP1.Get(Edit1.Text);
end;
When the page I am trying to display is just a normal page (ie. no login required) the results are fine, but if I try to grab the source of a page that requires a login then the result is the memo displays the source of the login page instead of the page I requested despite the fact that I am logged in to the site in both Firefox and IE browsers.
So my question is how can I "authenticate" myself with the site using the Indy components so I get the same source that I would get if I were to view the page in my browser after logging in ?
Thx,
Douglas
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您应该花一些时间来了解整个 HTTP 的工作原理,因为有时它比看起来更复杂。浏览器的作用不仅仅是简单地发出带有 URL 的 HTTP GET 或 POST 请求并返回一些 HTML。例如,它们存储了大量有关访问过的站点的信息,因为 HTTP 标头携带了很多有用的信息。他们的做法通常是特定于浏览器的,而不是系统范围的。其他浏览器或应用程序可能无法看到或使用它们。
您必须为给定站点正确设置 HTTP 标头,并处理身份验证等情况。具有身份验证的站点可以使用广泛的技术来允许访问,从简单的登录页面到 HTTP 身份验证方法。他们可以将您重定向到处理身份验证的页面,尽管对于已登录的用户来说这可能会透明地发生,但浏览器(或应用程序)将看到正在发生的情况并且必须处理该情况。
You should take some time to learn how the whole HTTP thing works, because sometimes it is more complex than what it looks. A browser does a lot more than simply issuing HTTP GET or POST request with a URL, and getting back some HTML. For example they store a lot of informations about visited sites, because the HTTP headers carry a lot of useful informations. How they do it is usually browsers specific, and not system-wide. Other browser or applications may be not able to see or use them.
You have to set up the HTTP headers properly for a given site, and handle situations like authentication. Sites with authentication can use broad range of techniques to allow access, from simple login pages to HTTP authentication methods. They can redirect you to pages that handle authentication, and although it can happen trasparently for an already logged user, a browser (or an application) will see what's happening and must handle that.
那是因为 Indy 只为您提供运输服务。
Indy 不进行登录;网站有。
网站可以通过多种方式进行登录。
大多数登录都需要网络浏览器的支持。
因此,您很可能需要模拟网络浏览器的功能。
这包括支持网站用于登录的所有技术。
这可能包括 Cookie、额外的 HTTP 标头、HTML 5、JavaScript、Flash 或其他功能。
为大量工作做好准备……
--jeroen
That is because Indy only does the transport for you.
Indy does not do the login; the website does.
There are dozens of ways a website can do a login.
Most of those logins require the support of a web-browser.
So you most likely need to simulate what a web-browser does.
That includes supporting all technologies that the web-site uses for the login.
That might include Cookies, extra HTTP headers, HTML 5, JavaScript, Flash or other features.
Be prepared for a lot of work...
--jeroen