Mechanicalize 响应不返回任何内容

发布于 2025-01-03 08:45:27 字数 1312 浏览 1 评论 0原文

我正在 Python 中使用 Mechanize 来执行一些网络抓取。大多数网站都可以运行,但某一特定页面不会返回任何内容或响应。

我的设置是

self._browser = mechanize.Browser()
self._browser.set_handle_refresh(True)  
self._browser.set_debug_responses(True)
self._browser.set_debug_redirects(True)  
self._browser.set_debug_http(True)

,要执行的代码是:

response = self._browser.open(url)

这是调试输出:

add_cookie_header
Checking xyz.com for cookies to return
- checking cookie path=/
 - checking cookie <Cookie ASP.NET_SessionId=j3pg0wnavh3yjseyj1v3mr45 for xyz.com/>
   it's a match
send: 'GET /page.aspx?leagueID=39 HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: xyz.com\r\nCookie: ASP.NET_SessionId=aapg9wnavh3yqyrtg1v3ar45\r\nConnection: close\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 07 Feb 2012 19:04:37 GMT
header: Pragma: no-cache
header: Expires: -1
header: Connection: close
header: Cache-Control: no-cache
header: Content-Length: 0
extract_cookies: Date: Tue, 07 Feb 2012 19:04:37 GMT
Pragma: no-cache
Expires: -1
Connection: close
Cache-Control: no-cache
Content-Length: 0

我尝试过使用和不使用重定向,但均无济于事。有什么想法吗?

我可能会添加该页面在浏览器中运行良好。

I'm using Mechanize in Python to perform some web scraping. Most of the website works but one particular page doesn't return any Content or Response.

My settings are

self._browser = mechanize.Browser()
self._browser.set_handle_refresh(True)  
self._browser.set_debug_responses(True)
self._browser.set_debug_redirects(True)  
self._browser.set_debug_http(True)

and the code to execute is:

response = self._browser.open(url)

This is the debug output:

add_cookie_header
Checking xyz.com for cookies to return
- checking cookie path=/
 - checking cookie <Cookie ASP.NET_SessionId=j3pg0wnavh3yjseyj1v3mr45 for xyz.com/>
   it's a match
send: 'GET /page.aspx?leagueID=39 HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: xyz.com\r\nCookie: ASP.NET_SessionId=aapg9wnavh3yqyrtg1v3ar45\r\nConnection: close\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 07 Feb 2012 19:04:37 GMT
header: Pragma: no-cache
header: Expires: -1
header: Connection: close
header: Cache-Control: no-cache
header: Content-Length: 0
extract_cookies: Date: Tue, 07 Feb 2012 19:04:37 GMT
Pragma: no-cache
Expires: -1
Connection: close
Cache-Control: no-cache
Content-Length: 0

I've tried with and without Redirect to no avail. Any ideas?

I might add the page works fine in a browser.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

平生欢 2025-01-10 08:45:27

找出问题所在的过程通常是这样的:

  1. 成功打开 url 时捕获 Web 浏览器流量
  2. 尝试打开 url 时捕获 python 流量

对于第一步,有许多可用的工具。例如,在 Firefox 中,HttpFox实时 HTTP 标头 可能非常有用。

对于第二步,以编程方式记录发送/接收的标头应该足够了。

对于这两个步骤,您还可以使用 wireshark 之类的工具捕获网卡中的流量。

The procedure to find out what's the problem usually is this one:

  1. Capture your web browser traffic when successfully opening the url
  2. Capture python traffic when trying to open the url

For the first step, there are many tools available. For example, in firefox, HttpFox and Live HTTP Headers might be quite useful.

For the second step, programmatically logging the headers being sent/received should be enough.

For both steps, you can also capture traffic in your network card with something like wireshark.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文