因此,我的目标是通过Python Library MechanicalSoup从网页中读取表的内容,
我的问题是我无法正确地授权我的请求
导致<响应[403]>
示例网站:
>>> import mechanicalsoup
>>> browser = mechanicalsoup.StatefulBrowser()
>>> browser.open("https://opensea.io/rankings?sortBy=one_day_volume")
<Response [403]>
在浏览器(url)中。
opensea.io/rankings?sortby = one_day_volume 您目前的访问是根据这些策略不允许的。仅网站所有者可以更改网站访问策略。
我认为这与我请求的cookie或其他参数有关
,这就是为什么我尝试通过cookie而不成功地传递cookie的
import mechanicalsoup
import pandas as pd
import sqlite3
import requests
response = requests.get('https://opensea.io/rankings?sortBy=one_day_volume')
responsecookies = response.cookies
print(response.headers)
# {'Date': 'Fri, 15 Apr 2022 15:50:21 GMT', 'Content-Type': 'text/html;
# charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive',
# 'X-Frame-Options': 'SAMEORIGIN', 'Referrer-Policy': 'same-origin',
# 'Cache-Control': 'private, max-age=0, no-store, no-cache, must-revalidate,
# post-check=0, pre-check=0', 'Expires': 'Thu, 01 Jan 1970 00:00:01 GMT',
# 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"',
# 'Set-Cookie': '__cf_bm=tumOZlW184DvFKZxjESM4RmsFtWcSWCsULENv42SGPE-1650037821-0-AbEEcLHRXFdyj8qFXj3yD6tHIjU0MLj5Sjq8dITWab+S7w8kxgOW38ZODJqp9mwl3WuLK+ub4Yu1W3kxcvX2C3Q=;
# path=/; expires=Fri, 15-Apr-22 16:20:21 GMT; domain=.opensea.io; HttpOnly; Secure;
# SameSite=None', 'Vary': 'Accept-Encoding', 'Strict-Transport-Security': 'max-age=0;
# includeSubDomains; preload', 'X-Content-Type-Options': 'nosniff', 'Server': 'cloudflare',
# 'CF-RAY': '6fc5d61fab7a9b9b-FRA', 'Content-Encoding': 'gzip'}
print(responsecookies)
#<RequestsCookieJar[<Cookie __cf_bm=tumOZlW184DvFKZxjESM4RmsFtWcSWCsULENv42SGPE-1650037821-0-AbEEcLHRXFdyj8qFXj3yD6tHIjU0MLj5Sjq8dITWab+S7w8kxgOW38ZODJqp9mwl3WuLK+ub4Yu1W3kxcvX2C3Q= for .opensea.io/>]>
browser = mechanicalsoup.StatefulBrowser()
browser.open("https://opensea.io/rankings?sortBy=one_day_volume", cookies=responsecookies)
# <Response [403]>
原因我最好分析我的请求必须包含哪些参数?
&amp;我该如何使它们正确?
谢谢您阅读本文
so my goal is to read the content of a table from a web page with the python library mechanicalsoup
my problem is that I am not able to authorize my requests properly
resulting in <Response [403]>
example website:https://opensea.io/rankings?sortBy=one_day_volume
>>> import mechanicalsoup
>>> browser = mechanicalsoup.StatefulBrowser()
>>> browser.open("https://opensea.io/rankings?sortBy=one_day_volume")
<Response [403]>
in the browser.open(url).content i can see that i get restricted by their policies
The access policies of a site define which visits are allowed. Your current visit is not allowed according to those policies.Only the site owner can change site access policies.
I think it has something to do with cookies or other parameters of my requests
thats why i tried passing cookies without success
import mechanicalsoup
import pandas as pd
import sqlite3
import requests
response = requests.get('https://opensea.io/rankings?sortBy=one_day_volume')
responsecookies = response.cookies
print(response.headers)
# {'Date': 'Fri, 15 Apr 2022 15:50:21 GMT', 'Content-Type': 'text/html;
# charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive',
# 'X-Frame-Options': 'SAMEORIGIN', 'Referrer-Policy': 'same-origin',
# 'Cache-Control': 'private, max-age=0, no-store, no-cache, must-revalidate,
# post-check=0, pre-check=0', 'Expires': 'Thu, 01 Jan 1970 00:00:01 GMT',
# 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"',
# 'Set-Cookie': '__cf_bm=tumOZlW184DvFKZxjESM4RmsFtWcSWCsULENv42SGPE-1650037821-0-AbEEcLHRXFdyj8qFXj3yD6tHIjU0MLj5Sjq8dITWab+S7w8kxgOW38ZODJqp9mwl3WuLK+ub4Yu1W3kxcvX2C3Q=;
# path=/; expires=Fri, 15-Apr-22 16:20:21 GMT; domain=.opensea.io; HttpOnly; Secure;
# SameSite=None', 'Vary': 'Accept-Encoding', 'Strict-Transport-Security': 'max-age=0;
# includeSubDomains; preload', 'X-Content-Type-Options': 'nosniff', 'Server': 'cloudflare',
# 'CF-RAY': '6fc5d61fab7a9b9b-FRA', 'Content-Encoding': 'gzip'}
print(responsecookies)
#<RequestsCookieJar[<Cookie __cf_bm=tumOZlW184DvFKZxjESM4RmsFtWcSWCsULENv42SGPE-1650037821-0-AbEEcLHRXFdyj8qFXj3yD6tHIjU0MLj5Sjq8dITWab+S7w8kxgOW38ZODJqp9mwl3WuLK+ub4Yu1W3kxcvX2C3Q= for .opensea.io/>]>
browser = mechanicalsoup.StatefulBrowser()
browser.open("https://opensea.io/rankings?sortBy=one_day_volume", cookies=responsecookies)
# <Response [403]>
how do i best analyze which parameters my request must contain?
& how do i pass them correct?
Thankyou for reading this
发布评论