请求:代理在People_also_ask模块中不起作用
我正在使用People_Also_ask模块从Google抓取搜索结果。该模块本身没有使用代理的方法,但我在模块中手动添加了代理。当我从Google被阻止时,我打印了状态,并打印了我的IP地址发送请求。我在People_also_ask模块中添加的代码使用代理是
proxies = {
'http' : "http://username:passward@ip:port"
}
response = SESSION.get(URL, params=params, headers=HEADERS, proxies=proxies)
。我知道这是一种非法活动,但我想知道为什么它主要用于教育目的。我认为提取数据的代码是无关紧要的,因此我添加了简单的代码来使用PEOPLE_ALSO_ASK模块
import people_also_ask as paa
queries = ["how to boil eggs","how to make cake","price of poco f1","price of wooden table","best soap in us","how much tesla worth"]
for query in queries:
questions = paa.get_related_questions(query ,40)
注意:更改是在people_also_people Module
<强>注意:我正在从浏览器进行搜索,没有任何问题。为什么Google允许我使用Google,但被阻止使用脚本
I am scraping search results from google using people_also_ask module. The module itself dont have method to use proxies but I manually added proxies in the module. When I got blocked from google I printed the status and it was printing my ip address was banned from sending requests. The code I added in people_also_ask module to use proxies is
proxies = {
'http' : "http://username:passward@ip:port"
}
response = SESSION.get(URL, params=params, headers=HEADERS, proxies=proxies)
.I know it is an illegal activity but I want to know why it happens for education purpose mainly. I think the code to extract the data is irrelevant so I am adding simple code to send request using people_also_ask module
import people_also_ask as paa
queries = ["how to boil eggs","how to make cake","price of poco f1","price of wooden table","best soap in us","how much tesla worth"]
for query in queries:
questions = paa.get_related_questions(query ,40)
Note: The changes are made in first function named search() of google.py of people_also_people module
Note: I am doing searchs from browser without any problem. why is google allowing me to use google but blocked from using the script
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
答案很简单。尽管这是一项代理服务,但不能保证100%的匿名性。当您通过代理服务器发送HTTP获取请求时,您的程序发送给代理服务器的请求是:
现在,当代理服务器将此请求发送到实际目的地时,它将发送:
您可以看到,它会抛出您的IP (在我的情况下,122.126.64.43)在HTTP标题中:X-Forwarded-Fored,因此该网站知道该请求是代表122.126.64.43发送的请求,
请阅读有关此标头的更多信息: https://www.rfc-editor.org/rfc/rfc/rfc7239
/www.rfc-editor.org/rfc/rfc/rfc7239“ rel =“ nofollow noreferrer 想要禁用设置X-Forwarded-for标头,请读: http:http:// www。 squid-cache.org/doc/config/forwarded_for/
我对答案没有任何荣誉,我从以下帖子中复制了此答案,我找到了 python请求模块 - 代理不起作用
The answer is quite simple. Although it is a proxy service, it doesn't guarantee 100% anonymity. When you send the HTTP GET request via the proxy server, the request sent by your program to the proxy server is:
Now, when the proxy server sends this request to the actual destination, it sends:
As you can see, it throws your IP (in my case, 122.126.64.43) in the HTTP header: X-Forwarded-For and hence the website knows that the request was sent on behalf of 122.126.64.43
Read more about this header at: https://www.rfc-editor.org/rfc/rfc7239
If you want to host your own squid proxy server and want to disable setting X-Forwarded-For header, read: http://www.squid-cache.org/Doc/config/forwarded_for/
I dont get any credit for the answer I copied this answer from the following post I found Python Requests module - proxy not working