如何使用脚本访问身份验证背后的网页上的文本?
我有一个网站,可以在登录后查看信息。我需要捕获显示在脚本中使用的东西。
安装软件不是一个选项 - 我必须使用Windows 10随附的工具来执行此操作。
我尝试了Chrome的Print-to-PDF功能,但这与身份验证不起作用。即使我登录并导航以查看所需的信息,即使打印的页面只是登录网址。
显然,PowerShell可以使用称为wscript
的东西发送击键,以突出显示窗口,复制所有内容并将其倒入文本文件中。不过,我不知道从哪里开始。
我试图使用Postman构建一个可以让我访问该页面的查询。但是,使用正确的凭据报告:
反伪造验证失败
,我注意到打开登录页面(在我登录之前)下载cookie时。我在Firefox中检查了开发人员工具,登录页面提供了此cookie,称为__ H2Requestverification
。在进行登录请求时,浏览器张贴了用户名,密码和此cookie(这是一个长的随机字母和数字字符串)。
我试图在Postman中手动执行此操作,但是当我到达提供凭据的部分时,我总是会得到”连接重置“错误,即使在cookie中提供令牌时也是如此。
邮递员的原始请求,以卷曲格式(这不起作用):
curl --location 'https://data-demo.xxx.ac.uk/account/login?ReturnUrl=%2F' \
--header 'Host: data-demo.xxx.ac.uk' \
--header 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0' \
--header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8' \
--header 'Accept-Language: en-GB,en;q=0.5' \
--header 'Accept-Encoding: gzip, deflate, br' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Content-Length: 182' \
--header 'Origin: https://data-demo.xxx.ac.uk' \
--header 'DNT: 1' \
--header 'Connection: keep-alive' \
--header 'Referer: https://data-demo.xxx.ac.uk/account/login?ReturnUrl=%2F' \
--header 'Cookie: __H2RequestVerification=Wj3e8tH-8ikvaghOBS0k5x0Vd9X74CRhVRw5Ch9BgNwLIkfGYNI0Do9stFyI0B0yVoq6BQIeJZTGqApRs8Tb3tx0sMg1' \
--header 'Upgrade-Insecure-Requests: 1' \
--header 'Sec-Fetch-Dest: document' \
--header 'Sec-Fetch-Mode: navigate' \
--header 'Sec-Fetch-Site: same-origin' \
--header 'Sec-Fetch-User: ?1' \
--header 'Sec-GPC: 1' \
--header 'TE: trailers' \
--form '__RequestVerificationToken="JtyADE1k-gov_-IYAGMh4urwLI0GK32wlltEZUPetV2TPSMpLE1vY7L8qBkn-Z9sWfcQl9vZfWukq04C55Oj9cFBRkU1"' \
--form 'EmailOrUsername="abc@123"' \
--form '.xxx="aPassWord"'
我不知道如何仅复制Firefox的原始HTTP请求,尽管我认为必须有一种方法。需要明确的是,这是有效的方式。
这是标题:
Host: data-demo.xxx.ac.uk
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate, br
Content-Type: application/x-www-form-urlencoded
Content-Length: 182
Origin: https://data-demo.xxx.ac.uk
DNT: 1
Connection: keep-alive
Referer: https://data-demo.xxx.ac.uk/account/login
Cookie: __H2RequestVerification=Wj3e8tH-8ikvaghOBS0k5x0Vd9X74CRhVRw5Ch9BgNwLIkfGYNI0Do9stFyI0B0yVoq6BQIeJZTGqApRs8Tb3tx0sMg1
Upgrade-Insecure-Requests: 1
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: same-origin
Sec-Fetch-User: ?1
Sec-GPC: 1
TE: trailers
这是formdata:
__RequestVerificationToken "u9tHCizsNnw0iZ4olHk5gt7gAqMCDEDrcQvZWM08TdT-U10NRfuEU2B8leZ4TU5Eq8UzE8YsfEemwvr8xCcHnVFJKnU1"
EmailOrUsername "123@abc"
Password "aPassWord"
cookie:
__H2RequestVerification "Wj3e8tH-8ikvaghOBS0k5x0Vd9X74CRhVRw5Ch9BgNwLIkfGYNI0Do9stFyI0B0yVoq6BQIeJZTGqApRs8Tb3tx0sMg1"
I have a website that I can view information after a login. I need to capture something displayed to be used in a script.
Installing software is not an option - I have to do this with the tools that come with windows 10.
I tried Chrome's print-to-pdf feature, but this doesn't work with authentication. The printed page was just the login url, even though I logged in and navigated to view information I need.
Apparently, Powershell can use something called wscript
to send keystrokes, to highlight the window, copy everything and dump it into a text file. I have no idea where to start with that, though.
I tried to use postman to build a query that would let me access that page. However, using the correct credentials reports:
anti forgery validation failed
When using postman, I noticed that when the login page is opened (before I log in) a cookie is downloaded. I checked in the developer tools in Firefox, and the login page provides this cookie, called __H2RequestVerification
. When making the login request, the browser POSTs with the username, password, and this cookie (which is a long random string of letters and numbers).
I tried to do this in postman manually, but when I get to the part where credentials are supplied, I always get a "connection reset" error, even when supplying the token in the cookie.
Raw request from Postman, in curl format (this does not work):
curl --location 'https://data-demo.xxx.ac.uk/account/login?ReturnUrl=%2F' \
--header 'Host: data-demo.xxx.ac.uk' \
--header 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0' \
--header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8' \
--header 'Accept-Language: en-GB,en;q=0.5' \
--header 'Accept-Encoding: gzip, deflate, br' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Content-Length: 182' \
--header 'Origin: https://data-demo.xxx.ac.uk' \
--header 'DNT: 1' \
--header 'Connection: keep-alive' \
--header 'Referer: https://data-demo.xxx.ac.uk/account/login?ReturnUrl=%2F' \
--header 'Cookie: __H2RequestVerification=Wj3e8tH-8ikvaghOBS0k5x0Vd9X74CRhVRw5Ch9BgNwLIkfGYNI0Do9stFyI0B0yVoq6BQIeJZTGqApRs8Tb3tx0sMg1' \
--header 'Upgrade-Insecure-Requests: 1' \
--header 'Sec-Fetch-Dest: document' \
--header 'Sec-Fetch-Mode: navigate' \
--header 'Sec-Fetch-Site: same-origin' \
--header 'Sec-Fetch-User: ?1' \
--header 'Sec-GPC: 1' \
--header 'TE: trailers' \
--form '__RequestVerificationToken="JtyADE1k-gov_-IYAGMh4urwLI0GK32wlltEZUPetV2TPSMpLE1vY7L8qBkn-Z9sWfcQl9vZfWukq04C55Oj9cFBRkU1"' \
--form 'EmailOrUsername="abc@123"' \
--form '.xxx="aPassWord"'
I don't know how to copy just the raw HTTP request from Firefox, though I presume there must be a way. To be clear, this is the way that works.
Here are the headers:
Host: data-demo.xxx.ac.uk
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate, br
Content-Type: application/x-www-form-urlencoded
Content-Length: 182
Origin: https://data-demo.xxx.ac.uk
DNT: 1
Connection: keep-alive
Referer: https://data-demo.xxx.ac.uk/account/login
Cookie: __H2RequestVerification=Wj3e8tH-8ikvaghOBS0k5x0Vd9X74CRhVRw5Ch9BgNwLIkfGYNI0Do9stFyI0B0yVoq6BQIeJZTGqApRs8Tb3tx0sMg1
Upgrade-Insecure-Requests: 1
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: same-origin
Sec-Fetch-User: ?1
Sec-GPC: 1
TE: trailers
Here is the formdata:
__RequestVerificationToken "u9tHCizsNnw0iZ4olHk5gt7gAqMCDEDrcQvZWM08TdT-U10NRfuEU2B8leZ4TU5Eq8UzE8YsfEemwvr8xCcHnVFJKnU1"
EmailOrUsername "123@abc"
Password "aPassWord"
And the cookie:
__H2RequestVerification "Wj3e8tH-8ikvaghOBS0k5x0Vd9X74CRhVRw5Ch9BgNwLIkfGYNI0Do9stFyI0B0yVoq6BQIeJZTGqApRs8Tb3tx0sMg1"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您确实可以使用硒,这是一个想法:
You can indeed use Selenium, here's an idea :
为了使诸如反伪造验证之类的事情失败无法检测您的尝试,该站点使用JavaScript在初始页面加载后加载数据。
The only way to scrape sites like this is to use a program that is driving a real browser using Selenium (
如果不安装硒或其他可以在页面上运行JavaScript的软件,则无法执行此操作。
In order for things like anti forgery validation failed to detect your attempts, the site uses JavaScript to load the data after the initial page load.
The only way to scrape sites like this is to use a program that is driving a real browser using Selenium (see this question).
You can not do this without installing Selenium or some other software that can run the JavaScript on the page.