硒拒绝访问和站点请求
我正在尝试从网站上抓取产品,我首先使用请求(带标题)进行尝试,但我的列表是空的,如果我打印了 si 则没有获得与浏览器上相同的 html,所以我使用此代码尝试了 selenium :
og_name_list = []
item = 'https://www.bstn.com/eu_nl/catalogsearch/result/?q=jordan&categories=Men~Footwear~Sneakers&raffle=No'
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path = '/Users/maurijnvd/Downloads/chromedriver 2', options=options)
driver.get(item)
html = driver.page_source
s = BeautifulSoup(html, 'lxml')
names = s.find_all('a', class_='catalog-grid-item__name-link', href=True)
for name in names:
namemp = name['href']
og_name_list.append(namemp)
print(len(og_name_list))
print(s)
但输出 html 包含访问被拒绝消息:
0
<html class="no-js" lang="en-US"><!--<![endif]--><head>
<title>Access denied | www.bstn.com used Cloudflare to restrict access</title>
<meta charset="utf-8"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=Edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta content="noindex, nofollow" name="robots"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<link href="/cdn-cgi/styles/main.css" id="cf_styles-css" media="screen,projection" rel="stylesheet" type="text/css"/>
<script type="text/javascript">
(function(){if(document.addEventListener&&window.XMLHttpRequest&&JSON&&JSON.stringify){var e=function(a){var c=document.getElementById("error-feedback-survey"),d=document.getElementById("error-feedback-success"),b=new XMLHttpRequest;a={event:"feedback clicked",properties:{errorCode:1020,helpful:a,version:1}};b.open("POST","https://sparrow.cloudflare.com/api/v1/event");b.setRequestHeader("Content-Type","application/json");b.setRequestHeader("Sparrow-Source-Key","c771f0e4b54944bebf4261d44bd79a1e");
b.send(JSON.stringify(a));c.classList.add("feedback-hidden");d.classList.remove("feedback-hidden")};document.addEventListener("DOMContentLoaded",function(){var a=document.getElementById("error-feedback"),c=document.getElementById("feedback-button-yes"),d=document.getElementById("feedback-button-no");"classList"in a&&(a.classList.remove("feedback-hidden"),c.addEventListener("click",function(){e(!0)}),d.addEventListener("click",function(){e(!1)}))})}})();
</script>
<script defer="" src="https://api.radar.cloudflare.com/beacon.js"></script>
</head>
<body>
<div id="cf-wrapper">
<div class="cf-alert cf-alert-error cf-cookie-error hidden" data-translate="enable_cookies" id="cookie-alert">Please enable cookies.</div>
<div class="p-0" id="cf-error-details">
<header class="mx-auto pt-10 lg:pt-6 lg:px-8 w-240 lg:w-full mb-15 antialiased">
<h1 class="inline-block md:block mr-2 md:mb-2 font-light text-60 md:text-3xl text-black-dark leading-tight">
<span data-translate="error">Error</span>
<span>1020</span>
</h1>
<span class="inline-block md:block heading-ray-id font-mono text-15 lg:text-sm lg:leading-relaxed">Ray ID: 6e759ee1dd9c76a1 •</span>
<span class="inline-block md:block heading-ray-id font-mono text-15 lg:text-sm lg:leading-relaxed">2022-03-05 20:32:23 UTC</span>
<h2 class="text-gray-600 leading-1.3 text-3xl lg:text-2xl font-light">Access denied</h2>
</header>
<section class="w-240 lg:w-full mx-auto mb-8 lg:px-8">
<div class="w-1/2 md:w-full" id="what-happened-section">
<h2 class="text-3xl leading-tight font-normal mb-4 text-black-dark antialiased" data-translate="what_happened">What happened?</h2>
<p>This website is using a security service to protect itself from online attacks.</p>
</div>
</section>
<div class="py-8 text-center" id="error-feedback">
<div id="error-feedback-survey">
Was this page helpful?
<button class="border border-solid bg-white cf-button cursor-pointer ml-4 px-4 py-2 rounded" id="feedback-button-yes" type="button">Yes</button>
<button class="border border-solid bg-white cf-button cursor-pointer ml-4 px-4 py-2 rounded" id="feedback-button-no" type="button">No</button>
</div>
<div class="feedback-success feedback-hidden" id="error-feedback-success">
Thank you for your feedback!
</div>
</div>
<div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
<p class="text-13">
<span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">6e759ee1dd9c76a1</strong></span>
<span class="cf-footer-separator sm:hidden">•</span>
<span class="cf-footer-item sm:block sm:mb-1"><span>Your IP</span>: 2a02:a455:c03b:1:dc83:20cd:2a6b:cbaa</span>
<span class="cf-footer-separator sm:hidden">•</span>
<span class="cf-footer-item sm:block sm:mb-1"><span>Performance & security by</span> <a href="https://www.cloudflare.com/5xx-error-landing" id="brand_link" rel="noopener noreferrer" target="_blank">Cloudflare</a></span>
</p>
</div><!-- /.error-footer -->
</div><!-- /#cf-error-details -->
</div><!-- /#cf-wrapper -->
<script type="text/javascript">
window._cf_translation = {};
</script>
</body></html>
我想进入该网站,但我似乎无法进入,任何想法或帮助表示赞赏,我添加了错误的屏幕截图(由 selenium 制作)。我可以在普通浏览器中访问该网站,我想最好通过请求来抓取它,但硒也可以。谢谢
i am trying to scrape the products from a site, i tried it first with requests (with headers) but my list was empty and if i printed the s i didn't get the same html as on my browser so i tried selenium with this code:
og_name_list = []
item = 'https://www.bstn.com/eu_nl/catalogsearch/result/?q=jordan&categories=Men~Footwear~Sneakers&raffle=No'
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path = '/Users/maurijnvd/Downloads/chromedriver 2', options=options)
driver.get(item)
html = driver.page_source
s = BeautifulSoup(html, 'lxml')
names = s.find_all('a', class_='catalog-grid-item__name-link', href=True)
for name in names:
namemp = name['href']
og_name_list.append(namemp)
print(len(og_name_list))
print(s)
but the output html contained an access denied message:
0
<html class="no-js" lang="en-US"><!--<![endif]--><head>
<title>Access denied | www.bstn.com used Cloudflare to restrict access</title>
<meta charset="utf-8"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=Edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta content="noindex, nofollow" name="robots"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<link href="/cdn-cgi/styles/main.css" id="cf_styles-css" media="screen,projection" rel="stylesheet" type="text/css"/>
<script type="text/javascript">
(function(){if(document.addEventListener&&window.XMLHttpRequest&&JSON&&JSON.stringify){var e=function(a){var c=document.getElementById("error-feedback-survey"),d=document.getElementById("error-feedback-success"),b=new XMLHttpRequest;a={event:"feedback clicked",properties:{errorCode:1020,helpful:a,version:1}};b.open("POST","https://sparrow.cloudflare.com/api/v1/event");b.setRequestHeader("Content-Type","application/json");b.setRequestHeader("Sparrow-Source-Key","c771f0e4b54944bebf4261d44bd79a1e");
b.send(JSON.stringify(a));c.classList.add("feedback-hidden");d.classList.remove("feedback-hidden")};document.addEventListener("DOMContentLoaded",function(){var a=document.getElementById("error-feedback"),c=document.getElementById("feedback-button-yes"),d=document.getElementById("feedback-button-no");"classList"in a&&(a.classList.remove("feedback-hidden"),c.addEventListener("click",function(){e(!0)}),d.addEventListener("click",function(){e(!1)}))})}})();
</script>
<script defer="" src="https://api.radar.cloudflare.com/beacon.js"></script>
</head>
<body>
<div id="cf-wrapper">
<div class="cf-alert cf-alert-error cf-cookie-error hidden" data-translate="enable_cookies" id="cookie-alert">Please enable cookies.</div>
<div class="p-0" id="cf-error-details">
<header class="mx-auto pt-10 lg:pt-6 lg:px-8 w-240 lg:w-full mb-15 antialiased">
<h1 class="inline-block md:block mr-2 md:mb-2 font-light text-60 md:text-3xl text-black-dark leading-tight">
<span data-translate="error">Error</span>
<span>1020</span>
</h1>
<span class="inline-block md:block heading-ray-id font-mono text-15 lg:text-sm lg:leading-relaxed">Ray ID: 6e759ee1dd9c76a1 •</span>
<span class="inline-block md:block heading-ray-id font-mono text-15 lg:text-sm lg:leading-relaxed">2022-03-05 20:32:23 UTC</span>
<h2 class="text-gray-600 leading-1.3 text-3xl lg:text-2xl font-light">Access denied</h2>
</header>
<section class="w-240 lg:w-full mx-auto mb-8 lg:px-8">
<div class="w-1/2 md:w-full" id="what-happened-section">
<h2 class="text-3xl leading-tight font-normal mb-4 text-black-dark antialiased" data-translate="what_happened">What happened?</h2>
<p>This website is using a security service to protect itself from online attacks.</p>
</div>
</section>
<div class="py-8 text-center" id="error-feedback">
<div id="error-feedback-survey">
Was this page helpful?
<button class="border border-solid bg-white cf-button cursor-pointer ml-4 px-4 py-2 rounded" id="feedback-button-yes" type="button">Yes</button>
<button class="border border-solid bg-white cf-button cursor-pointer ml-4 px-4 py-2 rounded" id="feedback-button-no" type="button">No</button>
</div>
<div class="feedback-success feedback-hidden" id="error-feedback-success">
Thank you for your feedback!
</div>
</div>
<div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
<p class="text-13">
<span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">6e759ee1dd9c76a1</strong></span>
<span class="cf-footer-separator sm:hidden">•</span>
<span class="cf-footer-item sm:block sm:mb-1"><span>Your IP</span>: 2a02:a455:c03b:1:dc83:20cd:2a6b:cbaa</span>
<span class="cf-footer-separator sm:hidden">•</span>
<span class="cf-footer-item sm:block sm:mb-1"><span>Performance & security by</span> <a href="https://www.cloudflare.com/5xx-error-landing" id="brand_link" rel="noopener noreferrer" target="_blank">Cloudflare</a></span>
</p>
</div><!-- /.error-footer -->
</div><!-- /#cf-error-details -->
</div><!-- /#cf-wrapper -->
<script type="text/javascript">
window._cf_translation = {};
</script>
</body></html>
I want to enter the site but i can't seem to get in, any ideas or help appreciated, i added a screenshot (made by selenium) of the error. I can access the site in my normal browser, i would like to scrape it preferably with requests but selenium is also okey. Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不是这方面的专家,但根据您提供的信息(尤其是屏幕截图),您正在联系的网站似乎受到保护,因此我猜您无法通过这种方式访问它命令。这意味着不可能使用 Selenium 请求该网站的产品。
但是,正如我所说,我不是专家,所以如果我错了,请纠正我。
I'm no expert at this, but according to the information you're providing (especially the screenshot), it seems that the website you're contacting is protected, thus I guess you aren't able to access it via this kind of commands. That would mean it's impossible to request the site's products with Selenium.
But, as I said, I'm no expert, so correct me if I'm wrong.