经济高效地终止大量 SSL 连接
我最近设置了一个基于 Node.js 的 Web 套接字服务器,该服务器经过测试,可以在小型 EC2 实例 (m1.small) 上每秒处理大约 2,000 个新连接请求。考虑到 m1.small 实例的成本,以及将多个实例置于支持 WebSocket 的代理服务器(例如 HAProxy)后面的能力,我们对结果非常满意。
然而,我们意识到我们尚未使用 SSL 进行任何测试,因此研究了许多 SSL 选项。很明显,在代理服务器处终止 SSL 连接是理想的选择,因为代理服务器可以检查流量并插入 X-Forward-For 等标头,以便服务器知道请求来自哪个 IP。
因此,我研究了许多解决方案,例如 Pound、stunnel 和 Stud,所有这些解决方案都允许终止 443 上的传入连接,然后将其传递到端口 80 上的 HAProxy,后者又将连接传递到 Web 服务器。但不幸的是,我发现将流量发送到 c1.medium(高 CPU)实例上的 SSL 终止代理服务器很快就会消耗掉所有 CPU 资源,并且每秒只能处理 50 个左右的请求。我尝试使用上面列出的所有三个解决方案,所有这些解决方案的执行情况与我假设的情况大致相同,无论如何它们都依赖于 OpenSSL。我尝试使用 64 位非常大的高 CPU 实例 (c1.xlarge),发现性能仅随成本线性扩展。因此,根据 EC2 定价,我需要为每秒 200 个 SSL 请求支付大约 600 美元/月,而每秒 2,000 个非 SSL 请求则需要支付 60 美元/月。当我们开始计划每秒接受 1,000 或 10,000 个请求时,以前的价格很快在经济上变得不可行。
我还尝试使用 Node.js 的 https 服务器终止 SSL,其性能与 Pound、stunnel 和 Stud 非常相似,因此该方法没有明显的优势。
因此,我希望有人能提供帮助,建议我如何避免提供 SSL 连接所需的荒谬成本。我听说 SSL 硬件加速器可提供更好的性能,因为该硬件是为 SSL 加密和解密而设计的,但由于我们目前所有服务器都使用 Amazon EC2,因此除非我们有单独的数据,否则不能选择使用 SSL 硬件加速器具有物理服务器的中心。我只是想知道亚马逊、谷歌、Facebook 等公司如何在成本如此之高的情况下通过 SSL 提供所有流量。一定有更好的解决方案。
任何建议或想法将不胜感激。
谢谢 马特
I have recently set up a Node.js based web socket server that has been tested to handle around 2,000 new connection requests per second on a small EC2 instance (m1.small). Considering the cost of a m1.small instance, and the ability to put multiple instances behind a WebSocket capable proxy server such as HAProxy, we are very happy with the results.
However, we realised we had not done any testing using SSL yet, so looked into a number of SSL options. It became apparent that terminating SSL connections at the proxy server is ideal because then the proxy server can inspect the traffic and insert headers such as X-Forward-For so that the server knows which IP the request came from.
So I looked into a number of solutions such as Pound, stunnel and stud, all of which allowed incoming connections on 443 to be terminated, and then passed onto HAProxy on port 80, which in turn passes the connection onto the web servers. Unfortunately however, I found that sending traffic to the SSL termination proxy server on a c1.medium (High CPU) instance very quickly consumed all CPU resources, and only at a rate of 50 or so requests per second. I tried using all three of the solution listed above, and all of them performed roughly the same as I assume under the hood they all rely on OpenSSL anyway. I tried using a 64 bit very large High CPU instance (c1.xlarge) and found that performance only scale linearly with cost. So based on EC2 pricing, I'd need to pay roughly $600p/m for 200 SSL requests per second, as opposed to $60p/m for 2,000 non SSL requests per second. The former price becomes economically unviable very quickly when we start planning to accept 1,000s or 10,000s of requests per second.
I also tried terminating the SSL using Node.js' https server, and the performance was very similar to Pound, stunnel and stud, so no clear advantage to that approach.
So what I am hoping someone can help with is advising how I can get around this ridiculous cost we have to absorb to provide SSL connections. I have heard that SSL hardware accelerators provide much better performance as the hardware is designed for SSL encryption and decryption, but as we are currently using Amazon EC2 for all of our servers, using SSL hardware accelerators is not an option unless we have a separate data centre with physical servers. I am just struggling to see how the likes of Amazon, Google, Facebook can provide all their traffic over SSL when the cost of this is so high. There must be a better solution out there.
Any advice or ideas would be greatly appreciated.
Thanks
Matt
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我不太了解不同 EC2 实例上可用的 CPU 功率,但我认为您的问题不在于您选择的 TLS 终止代理软件,而在于它们的配置。
如果没有任何配置,我假设它们都会提供它们支持的所有密码套件,包括(非常)慢的密码套件。他们也可能会让客户选择他最喜欢的一款。
并非所有 TLS 密码套件生来都是平等的,有些套件的 CPU 成本比其他套件更高,无论是密钥交换还是密码本身。
根据所使用的软件,应该有一种方法来指定服务器接受的密码字符串(以及让服务器坚持这样做的方法)。对于 OpenSSL,这些工作方式如下: http://www.openssl.org/docs /apps/ciphers.html#CIPHER_STRINGS
如果您追求速度,至少确保您没有使用采用 Diffie-Hellman 的密码(非椭圆曲线类型)密钥交换。
要禁用使用 DH 密钥交换的密码套件,请确保字符串在某个时刻包含
!DH
。您可以使用
openssl ciphers -v 'HIGH:!aNULL:!DH:!ECDH'
来测试哪些字符串导致哪些密码可用。该字符串禁用正常的 Diffie-Hellman 和椭圆曲线 Diffie-Hellman 密钥交换。这可能只留下 RSA 密钥交换,具体取决于您的 OpenSSL 版本。
关于密码,您可能应该在您想要的 EC2 硬件上进行测试。如果没有硬件加速,您可能应该更喜欢 RC4,而不是 AES128,而不是 AES256,至少根据达到这个基准。
我还建议阅读这篇精彩的文章,尤其是第一个图,它显示了 DH 对 TLS 握手性能的影响,具有启发性。
最后,确保您使用 TLS 会话缓存。这也节省了一些 CPU。
I do not know much about the CPU power available on different EC2 instances, but I assume your problem lies not with your choice of TLS-terminating proxy software, but with their configuration.
Without any configuration, I'm assuming all of them would offer all cipher suites they support, including (very) slow ones. And they'll probably let the client pick the one it likes best, too.
Not all TLS cipher suites are born equal, some have higher CPU costs than others, be it from the key exchange or the cipher itself.
Depending on the software used, there should be a way to specify a string of ciphers the server accepts (and also a way to make the server insist on that). For OpenSSL these work this way: http://www.openssl.org/docs/apps/ciphers.html#CIPHER_STRINGS
If you're going for speed, at least make sure you're not using ciphers that employ Diffie-Hellman (the non-elliptic-curve kind) key-exchanges.
To disable cipher suites using DH key exchange, make sure the string includes
!DH
at some point.You can test what string results in which ciphers being available with, for example,
openssl ciphers -v 'HIGH:!aNULL:!DH:!ECDH'
.This string disables both normal Diffie-Hellman as well as Elliptic Curve Diffie-Hellman key exchanges. This probably only leaves RSA key exchange, depending on your OpenSSL version.
Regarding ciphers, you should probably test on your intended EC2 hardware. Without hardware acceleration, you should probably prefer RC4 over AES128 over AES256 over anything else, at least according to this benchmark.
I also suggest reading this wonderful post, especially the enlightening first diagram showing the impact of DH on TLS handshake performance.
Lastly, make sure you're using TLS session caching. That saves some CPU, too.
我刚刚意识到 Amazon 的弹性负载均衡器对于 SSL 终止来说非常慢...我在 www.blitz.io(没有关系,只是一个客户)上做了一个简单的测试,在 1 分钟内有 1 到 250 个并发连接。它失败得很厉害......但是,如果我在没有证书的情况下在 ELB 前端执行 TCP 443,在后端执行 TCP 443,则在该实例上运行 IIS 和 SSL 证书时,它会耗尽小型实例的 CPU。我只需要握手,这是一个简单的网络服务,为来自世界各地的客户提供服务。每次都有新的连接建立和拆除。
如何设计高流量 SSL Web 服务,最好使用 SSL 一直到后端以实现严格的安全合规性?
I just realized Amazon's Elastic Load Balancer is super slow for SSL Termination... I did a simple test on www.blitz.io (no relation, just a customer) with 1 to 250 concurrent connections over 1 minute. It failed horribly... But if I do TCP 443 on front end of ELB and TCP 443 on backend with no certificate, it wipes out a small instance's CPU when running IIS and an SSL cert on that instance. I need just handshakes, it's a simple web service serving clients from all over the place. New connection setup and teardown every time.
How can I design a high traffic SSL web service, preferably with SSL all the way to the backend for strict security compliance?
Node.js 的 https 服务器的性能与 Pound、stunnel 和 Stud 非常相似,并且这种方法没有明显的优势。
The performance of Node.js' https server is very similar to Pound, stunnel and stud,and there is no clear advantage to that approach.
我也想知道如何有效地做到这一点。 AWS ssl 终止速度非常慢,但也许有某种方法可以提高其性能。 Stud 看起来很有前途,但就像你提到的,也有很大的 cpu 成本。
I'm also wondering how to do this effectively. AWS ssl termination is dreadfully slow, but perhaps there is some way to improve its performance. Stud seemed promising but like you mentioned, also has a large cpu cost.