如何从具体的 URL 获取正确的 HTML 代码 (python)
我正在尝试编写一个代码,它将能够通过 whois.domaintools.com 验证域。
但是阅读 html 时有一个小问题,它与 whois.domaintools.com/notregistereddomain.com 源代码不匹配。怎么了?是请求的问题还是什么?我真的不知道如何解决。
import urllib2
def getPage():
url="http://whois.domaintools.com/notregistereddomain.com"
req = urllib2.Request(url)
try:
response = urllib2.urlopen(req)
return response.read()
except urllib2.HTTPError, error:
print "error: ", error.read()
a = error.read()
f = open("URL.txt", "a")
f.write(a)
f.close()
if __name__ == "__main__":
namesPage = getPage()
print namesPage
Im trying to write a code, that will be able to verify domain through whois.domaintools.com.
But theres a little problem with reading the html, that do not match with whois.domaintools.com/notregistereddomain.com source code. Whats wrong? Its problem with requsting or what? I really dont know how to solve it.
import urllib2
def getPage():
url="http://whois.domaintools.com/notregistereddomain.com"
req = urllib2.Request(url)
try:
response = urllib2.urlopen(req)
return response.read()
except urllib2.HTTPError, error:
print "error: ", error.read()
a = error.read()
f = open("URL.txt", "a")
f.write(a)
f.close()
if __name__ == "__main__":
namesPage = getPage()
print namesPage
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您使用
print error
而不是print error.read()
,您会看到从服务器收到 HTTP 403“Forbidden”答案。显然,该服务器不喜欢没有用户代理标头的请求(或者它不喜欢 Python 的标头,因为它不想从脚本中查询)。这是一个解决方法:
If you use
print error
instead ofprint error.read()
, you'll see that you're getting a HTTP 403 "Forbidden" answer from the server.Apparently this server doesn't like requests without a user-agent header (or it doesn't like Python's one because it doesn't want to be queried from a script). Here's a workaround: