适用于大量事务和大数据项目的 REST 或 SOAP
项目要求:
我需要构建一个 Web 服务,该服务将接收大块文本数据,大小从几千字节到 10 兆字节不等,但平均大小徘徊在几百千字节左右。将会有大量的交易,我想通过身份验证、SSL 以及最终的一些加密来确保服务的安全。
如果可能的话,我希望该服务支持 XML 和 JSON,并且我希望使用开源工具(最有可能是 PHP/MySQL)来构建它。
欢迎提出有关可能的 PHP 框架的建议。
我已经开始使用 Tonic 框架在 PHP 中构建 REST Web 服务,但我认为我可能会受到可以在 URI 中发布的数据量的限制,并且在 URL 中发布如此多的数据听起来很麻烦。所以我一直在研究 SOAP 解决方案。
问题:
我可以发布到 REST 服务的数据量是否受到限制?
SOAP 协议有数据限制吗?
根据项目需求,您建议我使用 SOAP 还是 REST?为什么?
Project Requirements:
I need to build a web service that will receive chunks of text data, any where from a few kilobytes up to 10 Megabytes, but the average hovering around a few hundred kilobytes. There will be a high volume of transactions and I'd like to secure the service with authentication, SSL, and eventually some encryption.
I'd like the service to support XML and JSON if possible, and I'd like to build it using open source tools - most likely PHP/MySQL.
Suggestions regarding possible PHP frameworks are welcome.
I had already started down the path to build a REST web service in PHP with Tonic framework, but I thought that I may be limited to the amount of data I could post in a URI and posting so much data in a URL sounded cumbersome. So I've been looking at SOAP solutions.
Questions:
Am I limited in the amount of data I can post to a REST service?
Are there data limits to the SOAP protocol?
Do you recommend I use SOAP or REST based on the project requirements and why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
无论您使用的是 REST 架构还是通过 HTTP 建立隧道的 SOAP 协议,HTTP 在协议级别上都不受限制。
您的主要限制将是内存,特别是如果您正在流式传输多个兆字节的有效负载。有了足够聪明的处理程序和兼容的请求,您可以选择在处理之前将较小的请求仅传输到内存,将较大的请求直接传输到磁盘。
诸如此类的细节可能会限制您选择的任何框架能为您做的事情,因为大多数框架可能会直接消耗到内存中。您是否可以更改它,或者根据请求进行更改,取决于框架以及您可能想要更改它的愿望。
选择 SOAP 作为协议可能是无关紧要的。它只是围绕您的有效负载的一个简单的 XML 信封,除非您还选择使用 SOAP 编码对消息进行编码。
由于您还想支持 JSON,因此 SOAP 堆栈可能会妨碍原始 HTTP 处理后端。
您没有提及 API 的复杂性,但您可能可以使用其中大部分内容的框架,只需为大型、繁重的事务编写自定义处理程序即可。
您每秒的交易量不会很高,因为交易本身会很大。如果你流式传输到磁盘,你也会有很大的 I/O 影响(无论如何你都会有这种影响,除非你在 RAM 中转换数据并将其吐出套接字而不是保存它) 。因此,I/O 吞吐量可能会成为限制整体性能的主要瓶颈。
如果您从消费者处获得上传,则每秒可能会获得大约 150K 字节(很多消费者的上传速度很糟糕 - 例如我的)。如果您有一个 I/O 通道可以持续推送,例如 60MB/秒,则相当于 400 个同时连接。因此,这应该让您了解基于预期交易量的机器扩展(当然,您已经完成了此分析 - 我只是在做这些事情时编造的)。每笔交易上传 300K,即每笔交易 2 秒,即 200 TPS。这是另一个需要考虑的数字(就您的数据库可以承受的程度而言,尤其是如果您将有几台机器为其提供数据)。
现代 CPU 很恶心,所以 PHP 或其他应该没问题——这不会是你的问题。
HTTP isn't limited at all at the protocol level whether you're using the REST architecture or the SOAP protocol tunneled over HTTP.
You major limit will be memory, especially if you're streaming several multi-megabyte payloads. With a clever enough handler, and a compliant request, you could choose to stream smaller requests to memory only and larger ones directly to disk before processing.
Details like that likely limit how much whatever framework you choose can do for you, since most will likely consume straight in to memory. Whether you can change that at all, or on a request by request basis is up to the framework and possibly your desire to change it.
The choice of SOAP as a protocol is likely irrelevant. It's just a simple XML envelope around your payload, unless you also choose to encode your messages with the SOAP encodings.
Since you want to support JSON as well, a SOAP stack would likely simply get in the way over a raw HTTP processing backend.
You don't mention the complexity of your API, but you could likely get by with a framework for much of it and just write a custom handler for the large, heavy transactions.
Your transactions per sec isn't going to be high, since the transaction themselves will be so large. If you're streaming to disk, you'll have a large I/O impact as well (you're going to have that anyway unless you're converting the data in RAM and spitting it back out the socket rather than saving it). So, I/O throughput is likely going to be your major bottleneck in terms of what's going to limit your overall performance.
If you're getting uploads from consumers, you'll probably be getting about 150K bytes per second (lots of consumer upload speeds suck -- mine for example). If you have an I/O channel that can push, say, 60MB/sec sustained, that's 400 simultaneous connections. So that should give you an idea of machine scaling based on anticipated transaction volumes (of course, you've already done this analysis -- I'm just making this stuff up as I go). With each transaction uploading 300K, that's 2s per transaction, so 200 TPS. That's another number to play with (in terms of perhaps what your database can sustain, etc. especially if you're going to have several machines feeding it).
Modern CPUs are disgusting, so PHP or whatever should be fine -- that won't be your problem.