如何使用Python登录网站?
我该怎么做呢? 我试图输入一些指定的链接(使用 urllib),但要做到这一点,我需要登录。
我从网站上获得了这个来源:
<form id="login-form" action="auth/login" method="post">
<div>
<!--label for="rememberme">Remember me</label><input type="checkbox" class="remember" checked="checked" name="remember me" /-->
<label for="email" id="email-label" class="no-js">Email</label>
<input id="email-email" type="text" name="handle" value="" autocomplete="off" />
<label for="combination" id="combo-label" class="no-js">Combination</label>
<input id="password-clear" type="text" value="Combination" autocomplete="off" />
<input id="password-password" type="password" name="password" value="" autocomplete="off" />
<input id="sumbitLogin" class="signin" type="submit" value="Sign In" />
这可能吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
也许您想使用斜纹。它非常容易使用,应该能够做你想做的事情。
它将如下所示:
一旦您使用
go...
浏览到您要登录的站点,您就可以使用showforms()
列出所有表单。只需从 python 解释器中尝试一下即可。Maybe you want to use twill. It's quite easy to use and should be able to do what you want.
It will look like the following:
You can use
showforms()
to list all forms once you usedgo…
to browse to the site you want to login. Just try it from the python interpreter.让我尝试简单一点,假设该网站的 URL 是 www.example.com,您需要通过填写用户名和密码进行注册,所以我们进入登录页面 http://www.example.com/login.php 现在查看它的源代码并搜索操作 URL,它将位于表单标签中,类似于
现在获取用户信息。 php 来创建绝对 URL,即“http://example.com/userinfo.php”,现在运行一个简单的 python 脚本
我希望有一天这可以帮助某人。
Let me try to make it simple, suppose URL of the site is www.example.com and you need to sign up by filling username and password, so we go to the login page say http://www.example.com/login.php now and view it's source code and search for the action URL it will be in form tag something like
now take userinfo.php to make absolute URL which will be 'http://example.com/userinfo.php', now run a simple python script
I Hope that this helps someone somewhere someday.
通常,您需要 cookie 来登录站点,这意味着 cookielib、urllib 和 urllib2。这是我在玩 Facebook 网页游戏时写回的一个课程:
您不一定需要 HTTPS 或重定向处理程序,但它们不会造成伤害,而且它使 opener 更加健壮。您可能也不需要 cookie,但仅从您发布的表格很难判断。我怀疑您可能纯粹是从已被注释掉的“记住我”输入中得出的。
Typically you'll need cookies to log into a site, which means cookielib, urllib and urllib2. Here's a class which I wrote back when I was playing Facebook web games:
You won't necessarily need the HTTPS or Redirect handlers, but they don't hurt, and it makes the opener much more robust. You also might not need cookies, but it's hard to tell just from the form that you've posted. I suspect that you might, purely from the 'Remember me' input that's been commented out.
网页自动化 ?当然,“webbot”
webbot
甚至可以工作具有动态更改 id 和类名的网页,并且具有比 selenium 或 mechanize 更多的方法和功能。这些文档也非常简单且易于使用:https://webbot.readthedocs.io
Web page automation ? Definitely "webbot"
webbot
even works web pages which have dynamically changing id and classnames and has more methods and features than selenium or mechanize.The docs are also pretty straight forward and simple to use : https://webbot.readthedocs.io
有关更多信息,请访问:https://docs.python.org/2/library/urllib2。 html
For more information visit: https://docs.python.org/2/library/urllib2.html
一般来说,网站可以通过多种不同的方式检查授权,但您所针对的方式似乎对您来说相当容易。
您所需要做的就是将表单编码的 blob 与您在其中看到的各种字段一起
POST
到auth/login
URL(忘记标签for
,它们是人类游客的装饰品)。handle=whatever&password-clear=pwd
等等,只要您知道句柄(又名电子邮件)和密码的值,就应该没问题。据推测,POST 会将您重定向到某个“您已成功登录”页面,其中包含一个
Set-Cookie
标头来验证您的会话(请务必保存该 cookie 并在会话中进一步交互时将其发回) !)。Websites in general can check authorization in many different ways, but the one you're targeting seems to make it reasonably easy for you.
All you need is to
POST
to theauth/login
URL a form-encoded blob with the various fields you see there (forget the labelsfor
, they're decoration for human visitors).handle=whatever&password-clear=pwd
and so on, as long as you know the values for the handle (AKA email) and password you should be fine.Presumably that POST will redirect you to some "you've successfully logged in" page with a
Set-Cookie
header validating your session (be sure to save that cookie and send it back on further interaction along the session!).对于 HTTP 的东西,当前的选择应该是: Requests- HTTP for Humans
For HTTP things, the current choice should be: Requests- HTTP for Humans