蟒蛇 +机械化不与 Delicious 合作
我正在使用 Mechanize 和 Beautiful soup 来从 Delicious 上刮掉一些数据,
from mechanize import Browser
from BeautifulSoup import BeautifulSoup
mech = Browser()
url = "http://www.delicious.com/varunsrin"
page = mech.open(url)
html = page.read()
soup = BeautifulSoup(html)
print soup.prettify()
这适用于我扔给它的大多数网站,但在 Delicious 上失败,输出如下
Traceback (most recent call last):
File "C:\Users\Varun\Desktop\Python-3.py",
line 7, in <module>
page = mech.open(url)
File "C:\Python26\lib\site-packages\mechanize\_mechanize.py",
line 203, in open
return self._mech_open(url, data, timeout=timeout) File
"C:\Python26\lib\site-packages\mechanize\_mechanize.py",
line 255, in _mech_open
raise response httperror_seek_wrapper: HTTP Error
403: request disallowed by robots.txt
C:\Program Files (x86)\ActiveState Komodo IDE 6\lib\support\dbgp\pythonlib\dbgp\client.py:1360:
DeprecationWarning:
BaseException.message has been deprecated as of Python 2.6
child = getattr(self.value, childStr)
C:\Program Files (x86)\ActiveState Komodo IDE 6\lib\support\dbgp\pythonlib\dbgp\client.py:456:
DeprecationWarning:
BaseException.message has been deprecated as of Python 2.6
return apply(func, args)
I'm using Mechanize and Beautiful soup to scrape some data off Delicious
from mechanize import Browser
from BeautifulSoup import BeautifulSoup
mech = Browser()
url = "http://www.delicious.com/varunsrin"
page = mech.open(url)
html = page.read()
soup = BeautifulSoup(html)
print soup.prettify()
This works for most sites I throw it at, but fails on Delicious with the following output
Traceback (most recent call last):
File "C:\Users\Varun\Desktop\Python-3.py",
line 7, in <module>
page = mech.open(url)
File "C:\Python26\lib\site-packages\mechanize\_mechanize.py",
line 203, in open
return self._mech_open(url, data, timeout=timeout) File
"C:\Python26\lib\site-packages\mechanize\_mechanize.py",
line 255, in _mech_open
raise response httperror_seek_wrapper: HTTP Error
403: request disallowed by robots.txt
C:\Program Files (x86)\ActiveState Komodo IDE 6\lib\support\dbgp\pythonlib\dbgp\client.py:1360:
DeprecationWarning:
BaseException.message has been deprecated as of Python 2.6
child = getattr(self.value, childStr)
C:\Program Files (x86)\ActiveState Komodo IDE 6\lib\support\dbgp\pythonlib\dbgp\client.py:456:
DeprecationWarning:
BaseException.message has been deprecated as of Python 2.6
return apply(func, args)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从此处获取使用 python+mechanize 模拟浏览器的一些技巧。添加
addheaders
和set_handle_robots
似乎是最低要求。使用下面的代码,我得到输出:Take some of the tips for emulating a browser with python+mechanize from here. Adding
addheaders
andset_handle_robots
appears to be the minimum required. With the code below, I get output: