HttpClient - 获取不正确的页面源
我使用 HttpClient
和 GetMethod
来获取 URL 的页面源:
http://www.google.com/finance?chdnp=1&chdd=1&chds=1&chdv=1&chvs=Logarithmic&chdeh=0&chdet=1264263288788&chddm=391&chddi=120&chls=Ohlc&q=NSE:.NSEI&
但不知何故,我总是最终获取以下页面源:
http://www.google.com/finance?q=NSE:.NSEI
谁能告诉我为什么以及如何获取页面源以前的网址?
I used HttpClient
and GetMethod
to get the page source of the URL :
http://www.google.com/finance?chdnp=1&chdd=1&chds=1&chdv=1&chvs=Logarithmic&chdeh=0&chdet=1264263288788&chddm=391&chddi=120&chls=Ohlc&q=NSE:.NSEI&
But somehow I always end up getting page source of :
http://www.google.com/finance?q=NSE:.NSEI
Can anyone tell me why and how to get page source of the former URL?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我将在这里冒险假设发生的事情是您的 HttpClient 实现在内部处理 HTTP 重定向,因此当您在第一个 URL 上调用
GetMethod
时,服务器 (google.com )可能会发回第二个 URL 的 HTTP 重定向(302 或 301)响应,这就是您最终返回的内容。原因可能是第一个 URL 需要某种 cookie,而您在发出请求时没有提供该 cookie。准确确定以这种方式发出请求时会发生什么情况的最佳方法是使用 WireShark 等工具或 Fiddler 分析来自 HttpClient 的 HTTP 请求/响应序列以及使用 FireFox 发出的正常请求或看看IE到底有什么不同。
I'm going to go out on a limb here and assume that what's going on is that your HttpClient implementation handles HTTP redirects internally and so when you call
GetMethod
on the first URL, the server (google.com) is probably sending back an HTTP redirect (302, or 301) response for the second URL which is what you end up getting back.The reason for that is probably that the first URL requires some sort of cookie which you're not providing when you make your request. The best way to determine exactly what happens when you make the request that way is to use a tool such as WireShark or Fiddler to analyse the HTTP request/response sequence from your HttpClient and that of a normal request made using FireFox or IE and see what exactly is different.