请求:代理在People_also_ask模块中不起作用

发布于 2025-01-29 00:31:56 字数 843 浏览 4 评论 0原文

我正在使用People_Also_ask模块从Google抓取搜索结果。该模块本身没有使用代理的方法,但我在模块中手动添加了代理。当我从Google被阻止时,我打印了状态,并打印了我的IP地址发送请求。我在People_also_ask模块中添加的代码使用代理是

            proxies = {
                    'http' : "http://username:passward@ip:port"
                        }
            response = SESSION.get(URL, params=params, headers=HEADERS, proxies=proxies)

。我知道这是一种非法活动,但我想知道为什么它主要用于教育目的。我认为提取数据的代码是无关紧要的,因此我添加了简单的代码来使用PEOPLE_ALSO_ASK模块

import people_also_ask as paa
queries = ["how to boil eggs","how to make cake","price of poco f1","price of wooden table","best soap in us","how much tesla worth"]
for query in queries:
    questions = paa.get_related_questions(query ,40)

注意:更改是在people_also_people Module

<强>注意:我正在从浏览器进行搜索,没有任何问题。为什么Google允许我使用Google,但被阻止使用脚本

I am scraping search results from google using people_also_ask module. The module itself dont have method to use proxies but I manually added proxies in the module. When I got blocked from google I printed the status and it was printing my ip address was banned from sending requests. The code I added in people_also_ask module to use proxies is

            proxies = {
                    'http' : "http://username:passward@ip:port"
                        }
            response = SESSION.get(URL, params=params, headers=HEADERS, proxies=proxies)

.I know it is an illegal activity but I want to know why it happens for education purpose mainly. I think the code to extract the data is irrelevant so I am adding simple code to send request using people_also_ask module

import people_also_ask as paa
queries = ["how to boil eggs","how to make cake","price of poco f1","price of wooden table","best soap in us","how much tesla worth"]
for query in queries:
    questions = paa.get_related_questions(query ,40)

Note: The changes are made in first function named search() of google.py of people_also_people module

Note: I am doing searchs from browser without any problem. why is google allowing me to use google but blocked from using the script

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

撩心不撩汉 2025-02-05 00:31:56

答案很简单。尽管这是一项代理服务,但不能保证100%的匿名性。当您通过代理服务器发送HTTP获取请求时,您的程序发送给代理服务器的请求是:

GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0

现在,当代理服务器将此请求发送到实际目的地时,它将发送:

GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0
Via: 1.1 naxserver (squid/3.1.8)
X-Forwarded-For: 122.126.64.43
Cache-Control: max-age=18000
Connection: keep-alive

您可以看到,它会抛出您的IP (在我的情况下,122.126.64.43)在HTTP标题中:X-Forwarded-Fored,因此该网站知道该请求是代表122.126.64.43发送的请求,

请阅读有关此标头的更多信息: https://www.rfc-editor.org/rfc/rfc/rfc7239

/www.rfc-editor.org/rfc/rfc/rfc7239“ rel =“ nofollow noreferrer 想要禁用设置X-Forwarded-for标头,请读: http:http:// www。 squid-cache.org/doc/config/forwarded_for/

我对答案没有任何荣誉,我从以下帖子中复制了此答案,我找到了 python请求模块 - 代理不起作用

The answer is quite simple. Although it is a proxy service, it doesn't guarantee 100% anonymity. When you send the HTTP GET request via the proxy server, the request sent by your program to the proxy server is:

GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0

Now, when the proxy server sends this request to the actual destination, it sends:

GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0
Via: 1.1 naxserver (squid/3.1.8)
X-Forwarded-For: 122.126.64.43
Cache-Control: max-age=18000
Connection: keep-alive

As you can see, it throws your IP (in my case, 122.126.64.43) in the HTTP header: X-Forwarded-For and hence the website knows that the request was sent on behalf of 122.126.64.43

Read more about this header at: https://www.rfc-editor.org/rfc/rfc7239

If you want to host your own squid proxy server and want to disable setting X-Forwarded-For header, read: http://www.squid-cache.org/Doc/config/forwarded_for/

I dont get any credit for the answer I copied this answer from the following post I found Python Requests module - proxy not working

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文