硒拒绝访问和站点请求

发布于 2025-01-12 10:33:47 字数 6688 浏览 4 评论 0原文

我正在尝试从网站上抓取产品,我首先使用请求(带标题)进行尝试,但我的列表是空的,如果我打印了 si 则没有获得与浏览器上相同的 html,所以我使用此代码尝试了 selenium :

        og_name_list = []
        item = 'https://www.bstn.com/eu_nl/catalogsearch/result/?q=jordan&categories=Men~Footwear~Sneakers&raffle=No'
        options = webdriver.ChromeOptions()
        options.add_argument('--headless')
        options.add_argument('--disable-dev-shm-usage')
        options.add_argument('--no-sandbox')
        driver = webdriver.Chrome(executable_path = '/Users/maurijnvd/Downloads/chromedriver 2', options=options)
        driver.get(item)
        html = driver.page_source
        s = BeautifulSoup(html, 'lxml')
        names = s.find_all('a', class_='catalog-grid-item__name-link', href=True)
        for name in names:
            namemp = name['href']
            og_name_list.append(namemp)
        print(len(og_name_list))
        print(s)

但输出 html 包含访问被拒绝消息:

            0
            <html class="no-js" lang="en-US"><!--<![endif]--><head>
            <title>Access denied | www.bstn.com used Cloudflare to restrict access</title>
            <meta charset="utf-8"/>
            <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
            <meta content="IE=Edge,chrome=1" http-equiv="X-UA-Compatible"/>
            <meta content="noindex, nofollow" name="robots"/>
            <meta content="width=device-width,initial-scale=1" name="viewport"/>
            <link href="/cdn-cgi/styles/main.css" id="cf_styles-css" media="screen,projection" rel="stylesheet" type="text/css"/>
            <script type="text/javascript">
            (function(){if(document.addEventListener&&window.XMLHttpRequest&&JSON&&JSON.stringify){var e=function(a){var c=document.getElementById("error-feedback-survey"),d=document.getElementById("error-feedback-success"),b=new XMLHttpRequest;a={event:"feedback clicked",properties:{errorCode:1020,helpful:a,version:1}};b.open("POST","https://sparrow.cloudflare.com/api/v1/event");b.setRequestHeader("Content-Type","application/json");b.setRequestHeader("Sparrow-Source-Key","c771f0e4b54944bebf4261d44bd79a1e");
            b.send(JSON.stringify(a));c.classList.add("feedback-hidden");d.classList.remove("feedback-hidden")};document.addEventListener("DOMContentLoaded",function(){var a=document.getElementById("error-feedback"),c=document.getElementById("feedback-button-yes"),d=document.getElementById("feedback-button-no");"classList"in a&&(a.classList.remove("feedback-hidden"),c.addEventListener("click",function(){e(!0)}),d.addEventListener("click",function(){e(!1)}))})}})();
            </script>
            <script defer="" src="https://api.radar.cloudflare.com/beacon.js"></script>
            </head>
            <body>
            <div id="cf-wrapper">
            <div class="cf-alert cf-alert-error cf-cookie-error hidden" data-translate="enable_cookies" id="cookie-alert">Please enable cookies.</div>
            <div class="p-0" id="cf-error-details">
            <header class="mx-auto pt-10 lg:pt-6 lg:px-8 w-240 lg:w-full mb-15 antialiased">
            <h1 class="inline-block md:block mr-2 md:mb-2 font-light text-60 md:text-3xl text-black-dark leading-tight">
            <span data-translate="error">Error</span>
            <span>1020</span>
            </h1>
            <span class="inline-block md:block heading-ray-id font-mono text-15 lg:text-sm lg:leading-relaxed">Ray ID: 6e759ee1dd9c76a1 •</span>
            <span class="inline-block md:block heading-ray-id font-mono text-15 lg:text-sm lg:leading-relaxed">2022-03-05 20:32:23 UTC</span>
            <h2 class="text-gray-600 leading-1.3 text-3xl lg:text-2xl font-light">Access denied</h2>
            </header>
            <section class="w-240 lg:w-full mx-auto mb-8 lg:px-8">
            <div class="w-1/2 md:w-full" id="what-happened-section">
            <h2 class="text-3xl leading-tight font-normal mb-4 text-black-dark antialiased" data-translate="what_happened">What happened?</h2>
            <p>This website is using a security service to protect itself from online attacks.</p>
            </div>
            </section>
            <div class="py-8 text-center" id="error-feedback">
            <div id="error-feedback-survey">
                        Was this page helpful?
                        <button class="border border-solid bg-white cf-button cursor-pointer ml-4 px-4 py-2 rounded" id="feedback-button-yes" type="button">Yes</button>
            <button class="border border-solid bg-white cf-button cursor-pointer ml-4 px-4 py-2 rounded" id="feedback-button-no" type="button">No</button>
            </div>
            <div class="feedback-success feedback-hidden" id="error-feedback-success">
                        Thank you for your feedback!
                     </div>
            </div>
            <div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
            <p class="text-13">
            <span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">6e759ee1dd9c76a1</strong></span>
            <span class="cf-footer-separator sm:hidden">•</span>
            <span class="cf-footer-item sm:block sm:mb-1"><span>Your IP</span>: 2a02:a455:c03b:1:dc83:20cd:2a6b:cbaa</span>
            <span class="cf-footer-separator sm:hidden">•</span>
            <span class="cf-footer-item sm:block sm:mb-1"><span>Performance &amp; security by</span> <a href="https://www.cloudflare.com/5xx-error-landing" id="brand_link" rel="noopener noreferrer" target="_blank">Cloudflare</a></span>
            </p>
            </div><!-- /.error-footer -->
            </div><!-- /#cf-error-details -->
            </div><!-- /#cf-wrapper -->
            <script type="text/javascript">
              window._cf_translation = {};
            
            
            </script>
            </body></html>

我想进入该网站,但我似乎无法进入,任何想法或帮助表示赞赏,我添加了错误的屏幕截图(由 selenium 制作)。我可以在普通浏览器中访问该网站,我想最好通过请求来抓取它,但硒也可以。谢谢在此处输入图像描述

i am trying to scrape the products from a site, i tried it first with requests (with headers) but my list was empty and if i printed the s i didn't get the same html as on my browser so i tried selenium with this code:

        og_name_list = []
        item = 'https://www.bstn.com/eu_nl/catalogsearch/result/?q=jordan&categories=Men~Footwear~Sneakers&raffle=No'
        options = webdriver.ChromeOptions()
        options.add_argument('--headless')
        options.add_argument('--disable-dev-shm-usage')
        options.add_argument('--no-sandbox')
        driver = webdriver.Chrome(executable_path = '/Users/maurijnvd/Downloads/chromedriver 2', options=options)
        driver.get(item)
        html = driver.page_source
        s = BeautifulSoup(html, 'lxml')
        names = s.find_all('a', class_='catalog-grid-item__name-link', href=True)
        for name in names:
            namemp = name['href']
            og_name_list.append(namemp)
        print(len(og_name_list))
        print(s)

but the output html contained an access denied message:

            0
            <html class="no-js" lang="en-US"><!--<![endif]--><head>
            <title>Access denied | www.bstn.com used Cloudflare to restrict access</title>
            <meta charset="utf-8"/>
            <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
            <meta content="IE=Edge,chrome=1" http-equiv="X-UA-Compatible"/>
            <meta content="noindex, nofollow" name="robots"/>
            <meta content="width=device-width,initial-scale=1" name="viewport"/>
            <link href="/cdn-cgi/styles/main.css" id="cf_styles-css" media="screen,projection" rel="stylesheet" type="text/css"/>
            <script type="text/javascript">
            (function(){if(document.addEventListener&&window.XMLHttpRequest&&JSON&&JSON.stringify){var e=function(a){var c=document.getElementById("error-feedback-survey"),d=document.getElementById("error-feedback-success"),b=new XMLHttpRequest;a={event:"feedback clicked",properties:{errorCode:1020,helpful:a,version:1}};b.open("POST","https://sparrow.cloudflare.com/api/v1/event");b.setRequestHeader("Content-Type","application/json");b.setRequestHeader("Sparrow-Source-Key","c771f0e4b54944bebf4261d44bd79a1e");
            b.send(JSON.stringify(a));c.classList.add("feedback-hidden");d.classList.remove("feedback-hidden")};document.addEventListener("DOMContentLoaded",function(){var a=document.getElementById("error-feedback"),c=document.getElementById("feedback-button-yes"),d=document.getElementById("feedback-button-no");"classList"in a&&(a.classList.remove("feedback-hidden"),c.addEventListener("click",function(){e(!0)}),d.addEventListener("click",function(){e(!1)}))})}})();
            </script>
            <script defer="" src="https://api.radar.cloudflare.com/beacon.js"></script>
            </head>
            <body>
            <div id="cf-wrapper">
            <div class="cf-alert cf-alert-error cf-cookie-error hidden" data-translate="enable_cookies" id="cookie-alert">Please enable cookies.</div>
            <div class="p-0" id="cf-error-details">
            <header class="mx-auto pt-10 lg:pt-6 lg:px-8 w-240 lg:w-full mb-15 antialiased">
            <h1 class="inline-block md:block mr-2 md:mb-2 font-light text-60 md:text-3xl text-black-dark leading-tight">
            <span data-translate="error">Error</span>
            <span>1020</span>
            </h1>
            <span class="inline-block md:block heading-ray-id font-mono text-15 lg:text-sm lg:leading-relaxed">Ray ID: 6e759ee1dd9c76a1 •</span>
            <span class="inline-block md:block heading-ray-id font-mono text-15 lg:text-sm lg:leading-relaxed">2022-03-05 20:32:23 UTC</span>
            <h2 class="text-gray-600 leading-1.3 text-3xl lg:text-2xl font-light">Access denied</h2>
            </header>
            <section class="w-240 lg:w-full mx-auto mb-8 lg:px-8">
            <div class="w-1/2 md:w-full" id="what-happened-section">
            <h2 class="text-3xl leading-tight font-normal mb-4 text-black-dark antialiased" data-translate="what_happened">What happened?</h2>
            <p>This website is using a security service to protect itself from online attacks.</p>
            </div>
            </section>
            <div class="py-8 text-center" id="error-feedback">
            <div id="error-feedback-survey">
                        Was this page helpful?
                        <button class="border border-solid bg-white cf-button cursor-pointer ml-4 px-4 py-2 rounded" id="feedback-button-yes" type="button">Yes</button>
            <button class="border border-solid bg-white cf-button cursor-pointer ml-4 px-4 py-2 rounded" id="feedback-button-no" type="button">No</button>
            </div>
            <div class="feedback-success feedback-hidden" id="error-feedback-success">
                        Thank you for your feedback!
                     </div>
            </div>
            <div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
            <p class="text-13">
            <span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">6e759ee1dd9c76a1</strong></span>
            <span class="cf-footer-separator sm:hidden">•</span>
            <span class="cf-footer-item sm:block sm:mb-1"><span>Your IP</span>: 2a02:a455:c03b:1:dc83:20cd:2a6b:cbaa</span>
            <span class="cf-footer-separator sm:hidden">•</span>
            <span class="cf-footer-item sm:block sm:mb-1"><span>Performance & security by</span> <a href="https://www.cloudflare.com/5xx-error-landing" id="brand_link" rel="noopener noreferrer" target="_blank">Cloudflare</a></span>
            </p>
            </div><!-- /.error-footer -->
            </div><!-- /#cf-error-details -->
            </div><!-- /#cf-wrapper -->
            <script type="text/javascript">
              window._cf_translation = {};
            
            
            </script>
            </body></html>

I want to enter the site but i can't seem to get in, any ideas or help appreciated, i added a screenshot (made by selenium) of the error. I can access the site in my normal browser, i would like to scrape it preferably with requests but selenium is also okey. Thanksenter image description here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

浪荡不羁 2025-01-19 10:33:47

我不是这方面的专家,但根据您提供的信息(尤其是屏幕截图),您正在联系的网站似乎受到保护,因此我猜您无法通过这种方式访问​​它命令。这意味着不可能使用 Selenium 请求该网站的产品。

但是,正如我所说,我不是专家,所以如果我错了,请纠正我。

I'm no expert at this, but according to the information you're providing (especially the screenshot), it seems that the website you're contacting is protected, thus I guess you aren't able to access it via this kind of commands. That would mean it's impossible to request the site's products with Selenium.

But, as I said, I'm no expert, so correct me if I'm wrong.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文