评估应用程序的商品硬件
假设,我想开发堆栈溢出网站。假设每天有 100 万个请求,我如何估计支持该网站所需的商品硬件数量。是否有任何案例研究可以解释在这种情况下可能实现的性能改进?
我知道 I/O 瓶颈是大多数系统中的主要瓶颈。提高 I/O 性能的可能选项有哪些?据我所知,很少有人会
- 缓存
- 复制
Suppose, I wanted to develop stack overflow website. How do I estimate the amount of commodity hardware required to support this website assuming 1 million requests per day. Are there any case studies that explains the performance improvements possible in this situation?
I know I/O bottleneck is the major bottleneck in most systems. What are the possible options to improve I/O performance? Few of them I know are
- caching
- replication
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以通过多种方式提高 I/O 性能,具体取决于您用于存储设置的内容:
您可能需要查看 的“经验教训”部分StackOverflow 架构。
You can improve I/O performance in several ways depending upon what you use for your storage setup:
You may want to look at "Lessons Learned" section of StackOverflow Architecture.
查看这个方便的工具:
http://www.sizinglounge.com/
以及戴尔的另一个指南:
< a href="http://www.dell.com/content/topics/global.aspx/power/en/ps3q01_graham?c=us&l=en&cs=555" rel="nofollow noreferrer">http:// /www.dell.com/content/topics/global.aspx/power/en/ps3q01_graham?c=us&l=en&cs=555
如果您想要自己的类似 stackoverflow 的社区,您可以注册与 StackExchange。
您可以在此处阅读一些案例研究:
高可扩展性 - Rackspace 现在如何使用 MapReduce 和 Hadoop 查询 TB 级数据
http://highscalability.com/how- rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
http://www.gear6.com/gear6-downloads?fid=56&dlt=case-study&ls=Veoh-Case-Study
check out this handy tool:
http://www.sizinglounge.com/
and another guide from dell:
http://www.dell.com/content/topics/global.aspx/power/en/ps3q01_graham?c=us&l=en&cs=555
if you want your own stackoverflow-like community, you can sign up with StackExchange.
you can read some case studies here:
High Scalability - How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
http://www.gear6.com/gear6-downloads?fid=56&dlt=case-study&ls=Veoh-Case-Study
每天 100 万个请求是 12 个/秒。堆栈溢出足够小,您可以(使用有趣的规范化和压缩技巧)将其完全放入 64 GB Dell PowerEdge 2970 的 RAM 中。我不确定缓存和复制应该在哪里发挥作用。
如果您在标准化方面遇到问题,可以使用 256GB 的 PowerEdge R900。
如果您不喜欢单点故障,您可以连接其中一些,然后通过套接字(最好是在单独的网卡上)推送更新。对于主存系统来说,即使是 12K/秒的峰值负载也不应该成为问题。
避免 I/O 瓶颈的最佳方法是不进行 I/O(尽可能多)。这意味着一个类似于 prevayler 的架构,具有批量写入(丢失几秒钟的数据没有问题),基本上是一个日志文件,并且为了复制也将它们写入套接字。
1 million requests per day is 12/second. Stack overflow is small enough that you could (with interesting normalization and compression tricks) fit it entirely in RAM of a 64 GByte Dell PowerEdge 2970. I'm not sure where caching and replication should play a role.
If you have a problem thinking enough about normalization, a PowerEdge R900 with 256GB is available.
If you don't like a single point of failure, you can connect a few of those and just push updates over a socket (preferably on a separate network card). Even a peak load of 12K/second should not be a problem for a main-memory system.
The best way to avoid the I/O bottleneck is to not do I/O (as much as possible). That means a prevayler-like architecture with batched writes (no problem to lose a few seconds of data), basically a log file, and for replication also write them out to a socket.