基于 Java 的大容量交易 Web 应用程序
我几乎没有处理大流量交易网站的经验,最近遇到了这个有趣的问题。我有兴趣知道在高负载(每秒数千个请求)下 Java Web 应用程序的瓶颈会出现在哪里。如果有人能给我一个高层次的方法来思考以下问题,那就太好了!
我唯一想到的是使用 memcached 缓存数据库查找,但我不知道如何计算每个请求将花费的时间,以及系统每秒可以处理多少个请求处理。
问题: 互联网规模的应用程序必须设计为能够处理大量交易。描述每秒必须平均处理 30,000 个 HTTP 请求的系统设计。 对于每个请求,系统必须使用通过 URL 查询字符串传入的关键字来查找包含 5000 万个单词的字典。每个响应将包含一个包含单词定义的字符串(100 字节或更少)。
描述系统的主要组件,并注意哪些组件应该是 定制的以及哪些组件可以利用第三方应用程序。包括每个组件的硬件估计。请注意,设计应以最低的硬件/软件许可成本实现最高的性能。
记录提出估算的基本原理。
描述如果每个定义为 10 KB,设计将如何改变。
I have next to no experience dealing with high volume transactional websites and recently came across this interesting question. I am interested in knowing where the bottlenecks in a Java web application would occur under high load (thousands of requests per second). If someone could give me a high level approach to thinking about the following question, that would be great!
The only thing I've come up with is to use memcached to cache the database look-ups but I don't know how to calculate the amount of time each request will take and therefore how many requests per second the system might be able to handle.
Question:
Internet-scale applications must be designed to process high volumes of transactions. Describe a design for a system that must process on average 30,000 HTTP requests per second.
For each request, the system must perform a look-up into a dictionary of 50 million words, using a key word passed in via the URL query string. Each response will consist of a string containing the definition of the word (100 bytes or less).
Describe the major components of the system, and note which components should be
custom-built and which components could leverage third-party applications. Include hardware estimates for each component. Please note that the design should include maximum performance at minimum hardware / software-licensing costs.
Document the rationale in coming up with the estimates.
Describe how the design would change if the definitions are 10 kilobytes each.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
作为背景,您可能会注意到诸如规范标记之类的基准标记。与您的场景相比,处理量明显更多,但您会发现 30,000 个请求/秒是一个相对较高的数字,但并不是高得离谱。
您还可能会找到加入 et al 很有用。 (免责声明:他们是同事。)
在您的场景中,我预计按成本降序排列:
您没有进行复杂的处理(例如图形渲染或火箭科学类型的数学) 。所以首先猜测:如果你的字典是一个数据库,那么执行查询的成本将主导其他一切。传统上,当我们在 Web/应用程序服务器层遇到瓶颈时,我们会通过添加更多实例来进行扩展,但如果数据库是瓶颈,那就更成问题了。那么一个方向是:您对数据库引擎的性能有何期望?30k tps 似乎可行吗?
您的第一个观察结果:缓存内容是一种常用的策略。在这里,您(大概)在整个字典中随机命中,因此缓存最近的答案本身可能不会有帮助,除非......您可以缓存整个内容吗?
50,000,000 * (100 + 开销) == ??
在 64 位操作系统上的 64 位 JVM 上也许它适合?
如果不是(并且随着数据变得非常大,那么可能不会)那么我们需要扩展。因此可以使用对高速缓存进行切片的策略。拥有(例如)4 台服务器,分别为 AF、GM、NP、TZ 提供服务(并且注意,4 个独立的缓存或 4 个独立的数据库)。让调度员指导请求。
As background you may note bechmarks such as specmarks. Compared with your scenario there is significantly more processing, but you will see that your 30,000 req/sec is a comparatively high, but not insanely high, figure.
You may also find Joines et al useful. (Disclaimer: they're colleagues.)
In your scenario I would expect in descending order of cost:
You're not doing complex processing (Eg. graphic rendering or rocket-science type math). So first guess: if your dictionary were a database then then the cost of doing a query is going to dominate everything else. Traditionally, when we hit bottlenecks in the Web/App server tier we scale by adding more instances, but if the database is the bottleneck that's more of a problem. So one direction: what performance can you expect from a database engine does 30k tps seem feasible?
Your first observation: cache stuff is a commonly used stategy. Here you have (presumably) random hits across the whole dictionary, hence caching recent asnwers in itself is probably not going to help, unless ... can you cache the whole thing?
50,000,000 * (100 + overhead) == ??
On a 64bit JVM on a 64bit OS maybe it fits?
If not (and as the data gets really big, then probably not) then we need to scale. Hence a strategy of slicing the cache may be used. Have (for example) 4 servers, serving A-F, G-M, N-P, T-Z respectively (and, note, 4 separate caches or 4 separate databases). Have a dispatcher directing the requests.
我要做的第一件事就是质疑这些数字。英语约有 170,000 个常用单词。加上所有其他通用语言,你的数量不会超过几百万。如果不是这种情况,您可以将最常见的单词缓存在快速缓存中,将不太常见的单词缓存在较慢的缓存中。即使以每秒 30K 的请求,也需要大约 30 分钟才能获取每个独特的单词。
基本上,如果数字不真实,那么设计大型系统就没有意义。
在 64 位 JVM 上这很容易实现。 5000 万*(100 + 开销)约为 10 GB(开销可能很高,因为您需要拥有密钥并对数据建立索引)一台 12 GB 的服务器成本约为 2,500 美元。
问题就像是请求的数量。您将需要拥有多台机器,但正如其他海报所暗示的那样,这些数字不太可能是真实的。我不认为这项服务会像 Facebook 一样昂贵,但您可能需要数十到数百台服务器来支持这么多请求。
The first thing I would do is question the numbers. English has about 170,000 words in common use. Add all the other common languages and you would have no more than a couple of million. If this is not the case you could cache the most common words in a fast cache and the less common words in a slower cache. Even at 30K request per second, it would take about 30 minutes to get every unqiue word.
Basically, there is no point designing an large system if the numbers are not real.
On a 64-bit JVM this fits easily. 50 million * (100 + overhead) is about 10 GB (overhead is like to be high as you need to have the key and index the data) A 12 GB server costs about $2,500.
The problem is like to be the number of requests. You will need to have multiple machines but as other posters have suggested the numbers are highly unlikely to be real. I don't image this service would be as expensive as facebook, but you are likely to need tens to hundreds of servers to support this many requests.