如何使用 BeautifulSoup 从网页上的某些 JavaScript 中提取长字符串文本?
我正在尝试编写一个脚本,以便可以登录网站,但为了做到这一点,我需要提供验证码。从 URL 获取验证码直接图像的唯一方法是提取巨大的字符串名称“challenge”,但由于某种原因我无法使用 BeautifulSoup 来做到这一点。提取长字符串的最佳方法是什么?
var RecaptchaState = {
site : '4LfjPgEA56AABAJExraAeYXdMbVhPcG__Hyv-URXF',
challenge : '03AHJ_VusE_PgNB0vfBpD2h53o8uGMt1MeKi9bzhOTsjt0ze7SKmHVNe8uADceoU3JLPjpp8cJCVDGiYKo1ho-r1JcV19tm26doUHqevixJjH8SZ26i4EWbUOQLEuODf0Kt6JI0ZhtfiIaIXDg9MhUyDCEt_qxFWbSHA',
is_incorrect : false,
programming_error : '',
error_message : '',
server : 'http://www.google.com/recaptcha/api/',
timeout : 18000
};
document.write('
<scr>
');
</scr>
I'm trying to write a script so I can log into a website, but in order to do that I need to present the captcha. The only way to get that direct image of the captcha from the URL is to extract the giant string name 'challenge' but I have not been able to do it with BeautifulSoup for some reason. What is the best way to extract the long string?
var RecaptchaState = {
site : '4LfjPgEA56AABAJExraAeYXdMbVhPcG__Hyv-URXF',
challenge : '03AHJ_VusE_PgNB0vfBpD2h53o8uGMt1MeKi9bzhOTsjt0ze7SKmHVNe8uADceoU3JLPjpp8cJCVDGiYKo1ho-r1JcV19tm26doUHqevixJjH8SZ26i4EWbUOQLEuODf0Kt6JI0ZhtfiIaIXDg9MhUyDCEt_qxFWbSHA',
is_incorrect : false,
programming_error : '',
error_message : '',
server : 'http://www.google.com/recaptcha/api/',
timeout : 18000
};
document.write('
<scr>
');
</scr>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我只想使用正则表达式。不确定这一点,但我不认为 beautifulsoup 解析 javascript--only (x)html:
给出:
'03AHJ_VusE_PgNB0vfBpD2h53o8uGMt1MeKi9bzhOTsjt0ze7SKmHVNe8uADceoU3JLPjpp8cJCVDGiYKo1ho-r1JcV19tm26doUHqevixJjH8 SZ26i4EWbUOQLEuODf0Kt6JI0ZhtfiIaIXDg9MhUyDCEt_qxFWbSHA'
I'd just use a regular expression. Not sure about this, but I don't think beautifulsoup parses javascript--only (x)html:
Gives:
'03AHJ_VusE_PgNB0vfBpD2h53o8uGMt1MeKi9bzhOTsjt0ze7SKmHVNe8uADceoU3JLPjpp8cJCVDGiYKo1ho-r1JcV19tm26doUHqevixJjH8SZ26i4EWbUOQLEuODf0Kt6JI0ZhtfiIaIXDg9MhUyDCEt_qxFWbSHA'
BeautifulSoup 不解析 js,您需要使用正则表达式或类似的方法来解析。
BeautifulSoup does not parse js, you need to dothis with a regex or similar.