Python脚本在不下载整个页面的情况下查看网页是否存在?
我正在尝试编写一个脚本来测试网页是否存在,如果它能够在不下载整个页面的情况下进行检查,那就太好了。
这是我的出发点,我已经看到多个示例以相同的方式使用 httplib,但是,我检查的每个站点都只是返回 false。
import httplib
from httplib import HTTP
from urlparse import urlparse
def checkUrl(url):
p = urlparse(url)
h = HTTP(p[1])
h.putrequest('HEAD', p[2])
h.endheaders()
return h.getreply()[0] == httplib.OK
if __name__=="__main__":
print checkUrl("http://www.stackoverflow.com") # True
print checkUrl("http://stackoverflow.com/notarealpage.html") # False
有什么想法吗?
编辑
有人建议这样做,但他们的帖子已被删除.. urllib2 是否避免下载整个页面?
import urllib2
try:
urllib2.urlopen(some_url)
return True
except urllib2.URLError:
return False
I'm trying to write a script to test for the existence of a web page, would be nice if it would check without downloading the whole page.
This is my jumping off point, I've seen multiple examples use httplib in the same way, however, every site I check simply returns false.
import httplib
from httplib import HTTP
from urlparse import urlparse
def checkUrl(url):
p = urlparse(url)
h = HTTP(p[1])
h.putrequest('HEAD', p[2])
h.endheaders()
return h.getreply()[0] == httplib.OK
if __name__=="__main__":
print checkUrl("http://www.stackoverflow.com") # True
print checkUrl("http://stackoverflow.com/notarealpage.html") # False
Any ideas?
Edit
Someone suggested this, but their post was deleted.. does urllib2 avoid downloading the whole page?
import urllib2
try:
urllib2.urlopen(some_url)
return True
except urllib2.URLError:
return False
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这个怎么样。
How about this.
你可以尝试
You can try
怎么样:
这将发送一个 HTTP HEAD 请求,如果响应状态代码 <<,则返回 True。 400.
how about this:
this will send an HTTP HEAD request and return True if the response status code is < 400.
使用
requests
,这很简单:这只是加载网站的标头。要测试此操作是否成功,您可以检查结果
status_code
。或者使用raise_for_status
方法,如果连接不成功,该方法会引发Exception
。Using
requests
, this is as simple as:This just loads the website's header. To test if this was successfull, you can check the results
status_code
. Or use theraise_for_status
method which raises anException
if the connection was not succesfull.