requests.get和aiohttp get和httpx模块之间的结果不同

发布于 2025-01-22 03:32:45 字数 1078 浏览 0 评论 0原文

我正在尝试使用预防机器人的网站。

使用以下脚本使用请求,我可以访问该网站。

request = requests.get(url,headers={**HEADERS,'Cookie': cookies})

我得到了所需的HTML。但是,当我使用AIOHTTP时,

async def get_data(session: aiohttp.ClientSession,url,cookies):
    async with session.get(url,timeout = 5,headers={**HEADERS,'Cookie': cookies}) as response:
        text = await response.text()
        print(text)

我会得到响应机器人预防页面。

这是我用于所有请求的标题。

HEADERS = {
    'User-Agent': 'PostmanRuntime/7.29.0',
    'Host': 'www.dnb.com',
    'Connection': 'keep-alive',
    'Accept': '/',
    'Accept-Encoding': 'gzip, deflate, br'
} 

我已经比较了请求标题。

有什么原因是结果不同吗?如果是这样,为什么?

编辑:我已经检查了HTTPX模块,问题出现在httpx.client() and httpx.asyncclient()中。

response = httpx.request('GET',url,headers={**HEADERS,'Cookie':cookies})

行不通。 (不是异步)

I am trying to access a site with a bot prevention.

with the following script using requests I can access the site.

request = requests.get(url,headers={**HEADERS,'Cookie': cookies})

and I am getting the desired HTML. but when I use aiohttp

async def get_data(session: aiohttp.ClientSession,url,cookies):
    async with session.get(url,timeout = 5,headers={**HEADERS,'Cookie': cookies}) as response:
        text = await response.text()
        print(text)

I am getting as a response the bot prevention page.

This is the headers I use for all the requests.

HEADERS = {
    'User-Agent': 'PostmanRuntime/7.29.0',
    'Host': 'www.dnb.com',
    'Connection': 'keep-alive',
    'Accept': '/',
    'Accept-Encoding': 'gzip, deflate, br'
} 

I have compared the requests headers both of requests.get and aiohttp and they are identical.

is there any reason the results are different? if so why?

EDIT: I've checked the httpx module, the problem occurs there aswell both with httpx.Client() and httpx.AsyncClient().

response = httpx.request('GET',url,headers={**HEADERS,'Cookie':cookies})

doesn't work as well. (not asyncornic)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

憧憬巴黎街头的黎明 2025-01-29 03:32:45

我尝试使用Wireshark捕获数据包,以比较请求和AIOHTTP。

服务器:

    import http
    server = http.server.HTTPServer(("localhost", 8080), 
    http.server.SimpleHTTPRequestHandler)
    server.serve_forever()

带有请求:

    import requests
    url = 'http://localhost:8080'
    HEADERS = {'Content-Type': 'application/json'}
    cookies = ''
    request = requests.get(url,headers={**HEADERS,'Cookie': cookies})

请求数据包:

    GET / HTTP/1.1
    Host: localhost:8080
    User-Agent: python-requests/2.27.1
    Accept-Encoding: gzip, deflate, br
    Accept: */*
    Connection: keep-alive
    Content-Type: application/json
    Cookie: 

带有AIOHTTP:

    import aiohttp
    import asyncio
    
    url = 'http://localhost:8080'
    HEADERS = {'Content-Type': 'application/json'}
    cookies = ''
    async def get_data(session: aiohttp.ClientSession,url,cookies):
        async with session.get(url,timeout = 5,headers={**HEADERS,'Cookie': cookies}) as response:
            text = await response.text()
            print(text)
    
    async def main():
        async with aiohttp.ClientSession() as session:
            await get_data(session,url,cookies)
    
    asyncio.run(main())

AIOHTTP数据包:

    GET / HTTP/1.1
    Host: localhost:8080
    Content-Type: application/json
    Cookie: 
    Accept: */*
    Accept-Encoding: gzip, deflate
    User-Agent: Python/3.10 aiohttp/3.8.1

如果网站似乎接受请求中的数据包,则可以尝试通过设置标头来制作AIOHTTP数据包相同:

    HEADERS = { 'User-Agent': 'python-requests/2.27.1','Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Type': 'application/json','Cookie': ''}

如果您还没有:我建议捕获请求捕获请求使用Wireshark确保AIOHTTP不会与您的标题弄乱。

您也可以尝试其他用户代理字符串,或者以不同订单的方式尝试标头。该订单不应该很重要,但是有些站点无论如何都会检查它以进行机器人保护(例如,这个问题)。

I tried capturing packets with wireshark to compare requests and aiohttp.

Server:

    import http
    server = http.server.HTTPServer(("localhost", 8080), 
    http.server.SimpleHTTPRequestHandler)
    server.serve_forever()

with requests:

    import requests
    url = 'http://localhost:8080'
    HEADERS = {'Content-Type': 'application/json'}
    cookies = ''
    request = requests.get(url,headers={**HEADERS,'Cookie': cookies})

requests packet:

    GET / HTTP/1.1
    Host: localhost:8080
    User-Agent: python-requests/2.27.1
    Accept-Encoding: gzip, deflate, br
    Accept: */*
    Connection: keep-alive
    Content-Type: application/json
    Cookie: 

with aiohttp:

    import aiohttp
    import asyncio
    
    url = 'http://localhost:8080'
    HEADERS = {'Content-Type': 'application/json'}
    cookies = ''
    async def get_data(session: aiohttp.ClientSession,url,cookies):
        async with session.get(url,timeout = 5,headers={**HEADERS,'Cookie': cookies}) as response:
            text = await response.text()
            print(text)
    
    async def main():
        async with aiohttp.ClientSession() as session:
            await get_data(session,url,cookies)
    
    asyncio.run(main())

aiohttp packet:

    GET / HTTP/1.1
    Host: localhost:8080
    Content-Type: application/json
    Cookie: 
    Accept: */*
    Accept-Encoding: gzip, deflate
    User-Agent: Python/3.10 aiohttp/3.8.1

If the site seems to accept packets from requests, then you could try making the aiohttp packet identical by setting the headers:

    HEADERS = { 'User-Agent': 'python-requests/2.27.1','Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Type': 'application/json','Cookie': ''}

If you haven't already, I suggest capturing the request with wireshark to make sure aiohttp isn't messing with your headers.

You can also try other user agent strings too, or try the headers in different orders. The order is not supposed to matter, but some sites check it anyway for bot protection (for example in this question).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文