对高度可扩展和模块化分布式服务器端架构的思考
我的问题并不是一个真正的问题,它更多的是征求意见——也许这甚至不是发布它的正确位置。尽管如此,这里的社区消息灵通,尝试一下也没有什么坏处……
我正在考虑如何创建一个高度可扩展的、最重要的是高度模块化的后端架构。例如,大型网站的整个后端生态系统有可能演变成面向未来的大型网站。
这将需要非常高度的关注点分离,以至于不仅可以(例如)底层数据库被替换(即从 Oracle 到 MySQL),而且数据库的实际类型也可以被替换(将 SQL 编辑为 KV,反之亦然)。
我设想这样一种情况,每个子系统在后端生态系统中公开自己的 API。通过这种方式,API 可以保持不变,而实现可以随着时间的推移而改变(甚至是根本性的)。
该系统必须是异构的,因为它不依赖于特定的语言。它必须能够容纳使用不同语言的模块或整个子系统。
然后我突然想到,我所想象的只是网络本身的架构。
所以这是我的讨论点:除了使用(主要)基于文本的协议的开销之外,是否有任何压倒性的原因为什么复杂的后端架构不应该以我描述的方式实现,或者是否有一些强有力的理由我'我缺少使用 Twisted、AMQP、Thrift 等通信协议吗?
更新:根据 @meagar 的评论,我也许应该重新表述这个问题:使用非常简单、灵活和易于理解的架构(即所有功能都作为一系列 RESTful API 公开)的明显优势是否足以弥补明显的性能损失在后端环境中使用此架构时会发生什么?
Mine is not really a question, it's more of a call for opinions - and perhaps this isn't even the right place to post it. Nevertheless, the community here is very informed, and there's no harm in trying...
I was thinking about ways to create a highly scalable and, above all, highly modular back-end architecture. For example, an entire back-end ecosystem for a large site that had the potential for future-proof evolution into a massive site.
This would entail a very high degree of separation of concerns, to the extent that not only could (say) the underling DB be replaced (ie from Oracle to MySQL) but the actual type of database could be replaced (ed SQL to KV, or vice versa).
I envision a situation where each sub-system exposes its own API within the back-end ecosystem. In this way, the API could remain constant, whilst the implementation could change (even radically) over time.
The system must be heterogeneous in that it's not tied to a specific language. It must be able to accommodate modules or entire sub-systems using different languages.
It then occurred to me that what I was imagining was simply the architecture of the web itself.
So here is my discussion point: apart from the overhead of using (mainly) text-based protocols is there any overriding reason why a complex back-end architecture should not be implemented in the manner I describe, or is there some strong rationale I'm missing for using communication protocols such as Twisted, AMQP, Thrift, etc?
UPDATE: Following a comment from @meagar, I should perhaps reformulate the question: are the clear advantages of using a very simple, flexible and well-understood architecture (ie all functionality exposed as a series RESTful APIs) enough to compensate the obvious performance hit incurred when using this architecture in a back-end context?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
[code]数据库的实际类型可以被替换(将 SQL 编辑为 KV,反之亦然)。[/code]
任何在两个表之间编写联接的人都会感到悲伤。如果你想要切换到 KV 的“能力”,那么你不应该暴露比 KV 可以支持的更丰富的 API。
您问题的答案取决于您想要实现的目标。您希望将每个模块保持在合理的范围内。使用正确的代码物理分层,使用具有副作用契约的定义接口,对每个接口的每个成功和失败案例使用测试用例。这样,您就可以依赖诸如“当用户进入 blah 页面时,生成一个 user-blah 事实,以便调用所有注册的事实侦听器”之类的内容。这允许您扩展系统,而无需从 A 点到 B 点直接调用,同时仍然对广泛不同的依赖项进行某种控制。 (我讨厌无法找到所有可能的符号引用的代码库!)
但是,我们将大量代码和类放入单个系统的事实是因为在系统通常非常非常昂贵。您需要考虑代码模块,尽可能地相互发出请求。函数调用和 REST 调用之间的时序差异大约是一到一百万(如果您只计算周期而不是挂钟时间,也许您可以将其低至一到一万) - 但我不这么认为当然)。此外,数据中心中线路上的任何内容都可能会遭受数据包丢失,因为无论您多么努力,都不存在 100% 无丢失的数据中心。数据包丢失意味着应用程序响应时间中的随机延迟峰值。
[code]the actual type of database could be replaced (ed SQL to KV, or vice versa).[/code]
And anyone who wrote a join between two tables will be sad. If you want the "ability" to switch to KV, then you should not expose an API richer than what KV can support.
The answer to your question depends on what it is you're trying to accomplish. You want to keep each module within reasonable reins. Use proper physical layering of code, use defined interfaces with side-effect contracts, use test cases for each success and failure case of each interface. That way, you can depend on things like "when user enters blah page, a user-blah fact is generated so that all registered fact listeners will be invoked." This allows you to extend the system without having direct calls from point A to point B, while still having some kind of control over widely disparate dependencies. (I hate code bases where you can't find-all to find all possible references to a symbol!)
However, the fact that we put lots of code and classes into a single system is because calling between systems is often very, very expensive. You want to think in terms of code modules making requests of each other where you can. The difference in timing between a function call and a REST call is something like one to a million (maybe you can get it as low as one to ten thousand, if you only count cycles, not wallclock time -- but I'm not so sure). Also, anything that goes on a wire in a datacenter may potentially suffer from packet loss, because there is no such thing as a 100% loss-free data center, no matter how hard you try. Packet loss means random latency spikes in the response time for your application.