Barnes and Noble 网站不将 HeadlessChrome 视为真实用户

发布于 2025-01-10 09:08:54 字数 887 浏览 5 评论 0原文

我使用 Chrome-Headless 的原因是因为它的行为就像一个真正的浏览器，但是当我将 chrome-headless + Selenium 定向到此 Barnes and Noble 链接时：

https://www.barnesandnoble.com/w/the-woman-they-could-not-silence-kate-moore/1138489968?ean=9781728242576

我收到此回复，无需导航到任何其他我知道

<html><head>
<title>Access Denied</title>


</head><body>
<h1>Access Denied</h1>
 
You don't have permission to access "https://www.barnesandnoble.com/w/the-woman-they-could-not-silence-kate-moore/1138489968?ean=9781728242576" on this server.<p>
Reference #


</p></body></html>

我需要添加标头和所有内容，但这与常规 GET 请求 + 标头有何不同？

还有什么能让 Chrome-headless 送给巴恩斯和诺贝尔奖呢？

我做错了什么？

我缺少什么？

原文

The reason I'm using Chrome-Headless is because it's acts like a real browser, but when I direct chrome-headless + Selenium to this Barnes and Noble link:

https://www.barnesandnoble.com/w/the-woman-they-could-not-silence-kate-moore/1138489968?ean=9781728242576

I get this response, without navigating to any other page

<html><head>
<title>Access Denied</title>


</head><body>
<h1>Access Denied</h1>
 
You don't have permission to access "https://www.barnesandnoble.com/w/the-woman-they-could-not-silence-kate-moore/1138489968?ean=9781728242576" on this server.<p>
Reference #


</p></body></html>

I understand that I would need to add headers and all, but how is this different than just a regular GET request + Headers?

What else is giving Chrome-headless away to the Barnes and Nobel in particular?

What am I doing wrong?

What am I missing?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

离笑几人歌 2025-01-17 09:08:54

您将看到以下访问被拒绝错误页面：

访问被拒绝

由于存在关键字 user-agent

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/98.0.4758.102 Safari/537.36

解决方案

您可以使用以下命令覆盖默认的useragent其他一些常规UserAgent 如下：

代码块：

选项 = 选项()
选项.无头= True
options.add_argument("开始最大化")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, 如 Gecko) Chrome/98.0.4758.102 Safari/537.36")
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
驱动程序= webdriver.Chrome（服务= s，选项=选项）
driver.get(“https://www.barnesandnoble.com/w/the-woman-they-could-not-silence-kate-moore/1138489968?ean=9781728242576”)
driver.save_screenshot("barnesandnoble.png")

屏幕截图：

barnesandnoble

参考资料

您可以在以下位置找到一些相关的详细讨论：

You are seeing the following Access Denied error page:

Access Denied

due to presence of the keyword Headless within the user-agent

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/98.0.4758.102 Safari/537.36

Solution

You can override the default useragent with some other regular UserAgent as follows:

Code Block:

options = Options()
options.headless = True
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36")
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get("https://www.barnesandnoble.com/w/the-woman-they-could-not-silence-kate-moore/1138489968?ean=9781728242576")
driver.save_screenshot("barnesandnoble.png")