http代理如何工作?
我在网上搜索了有关 http-proxy 的信息。 我阅读了有关代理服务器的维基文章。 但我还是不明白http代理是如何工作的,愚蠢的我。
这是我对 http 代理如何工作的假设: 如果我将http代理设置为特定的代理,例如Proxy_A,那么当我启动chrome/IE时,输入特定的URL,例如URL_A,chrome/IE是否将请求直接发送到Proxy_A, 那么Proxy_A会将请求发送到URL_A的真实服务器吗?
I searched the web for something about http-proxy.
I read wiki-articles about proxy server.
But I still don't understand how http proxy works, stupid me.
Here is my assumption about how http proxy works:
If I set the http-proxy to a specific one, say Proxy_A, then when I start up the chrome/IE, type in a specific URL, say URL_A, does the chrome/IE send the request directly to Proxy_A,
then the Proxy_A sends the request to the real server of URL_A?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
HTTP 代理使用 HTTP 协议,它是专门为 HTTP 连接而设计的,但也可以被滥用于其他协议(这已经是一种标准)
浏览器(客户端)发送
GET http://SERVER/path HTTP/1.1
到代理现在代理将实际请求转发到服务器。
服务器只会将代理视为连接并像对客户端一样对代理进行应答。
代理接收响应并将其转发回客户端。
这是一个透明的过程,几乎就像直接与服务器通信一样,因此浏览器实现 HTTP 代理的开销很小。
可以发送一些附加标头来识别客户端,表明他正在使用代理。
代理有时会出于各种目的更改/添加数据流中的内容。
例如,某些代理将您的真实 IP 包含在特殊的 HTTP 标头中,该标头可以在服务器端记录,或在其脚本中拦截。
更新:
与使用代理作为安全/隐私功能相关
正如您在上面的 ascii 中看到的,CLIENT 和 SERVER 之间没有直接通信。双方只需与他们之间的代理进行对话即可。
在现代世界中,客户端通常是浏览器,服务器通常是网络服务器(例如 Apache)。
在这样的环境中,用户通常相信代理是安全的并且不会泄露他们的身份。
然而,由于浏览器上运行的复杂软件框架,有许多可能的方法可以破坏这种安全模型。
例如,Flash 或 Java 小程序是代理连接如何断开的完美示例,Flash 和 Java 可能不太关心其父应用程序(浏览器)的代理设置。
另一个示例是 DNS 请求,无需代理即可到达目标名称服务器,具体取决于代理和应用程序设置。
另一个例子是 cookie 或您的浏览器元足迹(分辨率、响应时间、用户代理等),如果网络服务器过去已经认识您(或在没有代理的情况下再次与您见面),它们都可能识别您的身份。
最后,代理本身需要被信任,因为它可以读取通过它的所有数据,最重要的是它甚至可能能够破坏您的 SSL 安全性(阅读中间人)
在哪里从
获取代理
代理可以作为服务购买、扫描或自行运行。
公共代理
这些是最常用的代理,通常的术语“公共”非常具有误导性。
更好的术语是“开放代理”。如果您运行没有防火墙或身份验证的代理服务器,世界上任何人都可以找到它并滥用它。
绝大多数销售代理的公司只是在互联网上扫描此类代理,或者使用被黑客入侵的 Windows 计算机(僵尸网络)并将其出售以进行非法/垃圾邮件活动。
大多数现代国家都将未经授权使用开放代理视为滥用,这是很常见的事情,但实际上可能会导致入狱。
可以通过在互联网上搜索开放端口来扫描代理,典型的免费程序是 https://nmap.org
需要注意的是:更大规模的扫描几乎肯定会导致您的 ISP 禁止您的互联网连接。
付费代理
这里我们有 4 种类型的代理:
1) 付费公共(开放)代理
基本上,这些卖家出售或转售大量代理列表,这些代理列表会定期刷新以删除失效的代理。
这些代理被大规模滥用,并且通常被大多数网站列入黑名单,包括 Google。
另外这些代理通常非常不稳定并且非常慢。
这些代理中的绝大多数只是滥用错误配置的服务器。
这是一个竞争非常激烈的“市场”,谷歌会举出很多例子。
2) 付费黑客(僵尸网络)代理
这些滥用计算机,主要是物联网或 Windows 桌面作为代理主机。攻击者大规模将它们用于各种非法目的。
卖家通常称其为“住宅代理”,以隐藏其非法性质。
使用这样的代理无疑是非法的,如果您连接到它,被滥用的用户可以轻松记录“您的”IP,包括劫持您与目的地的连接的可能性。
根据来源,这些 IP 不会被列入黑名单,因此“质量”比公共代理要好得多。
3) 付费共享代理
这些是数据中心代理,通常是合法的并且具有快速上行链路的潜力。
由于存在如此多的电子商务垃圾邮件,这些 IP 被大规模滥用,通常会出现在黑名单中。
典型用途是规避 craigslist 限制或地理限制。
4) 付费私人/专用代理
“私”就是专用的意思。如果操作员很专业,这意味着您的代理不会与其他人共享。
这些通常用于更专业和合法的活动,特别是当代理 IP 租用较长时间时。
知名运营商是 https://us-proxies.com
自己的代理< br>
运行自己的代理也是可能的,有各种可用的开源项目。
最常用的代理服务器是 https://squid-cache.org
A HTTP proxy speaks the HTTP protocol, it's especially made for HTTP connections but can be abused for other protocols as well (which is kinda standard already)
The browser (CLIENT) sends
GET http://SERVER/path HTTP/1.1
to the PROXYNow the PROXY will forward the actual request to the SERVER.
The SERVER will only see the PROXY as connection and answer to the PROXY just like to a CLIENT.
The PROXY receives the response and forwards it back to the CLIENT.
It is a transparent process and nearly like directly communicating with a server so it's just a tiny overhead for the browser to implement a HTTP proxy.
There are some additional headers that can be sent to identify the client, reveal that he's using a proxy.
Proxies sometimes change/add content within the data stream for various purposes.
Some proxies for example include your real IP in a special HTTP HEADER which can be logged server-side, or intercepted in their scripts.
Update:
Related to using proxies as a security/privacy feature
As you can see in the ascii above, there is no direct communication between CLIENT and SERVER. Both parties just talk to the PROXY between them.
In modern worlds the CLIENT often is a Browser and the SERVER often is a Webserver (Apache for example).
In such an environment users often trust the PROXY to be secure and not leak their identity.
However there are many possible ways to ruin this security model due to complex software frameworks running on the browser.
For example Flash or Java applets are a perfect example how a proxy connection can get broken, Flash and Java both might not care much about the proxy settings of their parent application (browser).
Another example are DNS requests which can reach the destination nameserver without PROXY depending on the PROXY and the application settings.
Another example would be cookies or your browser meta footprint (resollution, response times, user-agent, etc.) which might both identify you if the webserver knows you from the past already (or meets you again without proxy).
And in the end, the proxy itself needs to be trusted as it can read all the data that goes through it and on top it might even be able to break your SSL security (read up on man in the middle)
Where to get proxies from
Proxies can be bought as a service, scanned for or simply run by yourself.
Public proxies
These are the most often used proxies and the usual term "public" is quite misleading.
The better term would be "open proxies". If you run a proxy server without firewall or authentication anyone in the world can find it and abuse it.
The large majority of companies selling proxies just scan the internet for such proxies or they use hacked windows computers (botnets) and sell them for mostly illegal/spam activity.
Most modern countries can see the use of an open proxy without authorization as abuse, it's a very common thing but can actually lead to prison time.
It's possible to scan for proxies by searching the internet for open ports, a typical free program would be https://nmap.org
As a word of caution: Larger scaled scanning will almost certainly get your internet connection banned by your ISP.
Paid proxies
Here we have 4 types of proxies:
1) Paid public (open) proxies
Basically these sellers sell or resell huge lists of proxies that are regularly refreshed to remove dead ones.
The proxies are abused on a massive scale and usually blacklisted on most sites, including Google.
Additional those proxies are usually very unstable and very slow.
The large majority of these proxies are simply abusing wrongly configured servers.
It's a very competitive "market", Google will lead to many examples.
2) Paid hacked (botnet) proxies
These are abusing computers, mostly internet-of-things or windows desktops as proxy hosts. The attackers use them in large scale for various illegal purposes.
Sellers usually call them "residential proxies" to hide the illegal nature of them.
Using such a proxy is without doubt illegal and the abused user can easily log "your" IP if you connect to it, including the possibility to hijack your connection to the destination.
Depending on the source those IPs are not blacklisted, so the "quality" is much better than public proxies.
3) Paid shared proxies
These are datacenter proxies, usually legal and potential with a fast uplink.
Due to the fact that there is so much e-commerce spam going on those IPs are massively abused and usually found in blacklists.
A typical use would be circumvention of craigslist restrictions or geo-restrictions.
4) Paid private/dedicated proxies
"private" means dedicated. If the operator is professional it means your proxy is not shared among other people.
These are often used for more professional and legal activity, especially when the proxy IP is rented for alonger period.
A well known operator would be https://us-proxies.com
Own proxies
Running an own proxy is possible as well, there are various open-source projects available.
The mostly used proxy server is https://squid-cache.org
为了补充约翰上面的精彩答案,一个重要的步骤是代理和客户端之间的初始 CONNECT 握手。来自 Websocket RFC
这与客户端用来打开的请求相同SSL 隧道,本质上使用代理
To add to John's great answer above, one important step is the initial CONNECT handshake between PROXY and CLIENT. From the Websocket RFC
This is the same request that a CLIENT uses to open an SSL tunnel, which essentially uses a proxy