我必须构建一个类似于“分支机构/商店定位器”的 Web 应用程序。 用户输入他的地址,网络应用程序就会在地图上标出附近的商店。
其中一项要求是:
“Web 应用程序必须支持 100 个并发用户以及高达 5GB/天的传输量。”
大部分传输的数据将是文本和 GUI 图像。
所以我的问题是:
- 这是否被视为高流量应用程序?
- 我可以通过哪些网络应用程序/网站获取类似的流量?
- 我需要实现诸如 memcached、模板缓存、负载服务器平衡等功能吗?
我以前曾从事过高流量应用程序的工作,但我从来都不是架构师。 因此,尽管我知道一些(不是全部)管理高流量场景的策略,但我并不熟悉它们的实际实施。
有人可以向我提供建议、反馈或建议的研究吗? 我是不是忽略了什么?
**另外,我正在使用 LAMP 和 Smarty 构建这个。
I have to build a web application that's similar to a "branch/store locater". A user types in his address and the web application will plot nearby stores on a map.
One of the requirements is:
"The web application must support 100 simultaneous users and up to 5GB/day transfer volume."
Much of the transferred data will be text and GUI images.
So my questions are:
- Is this considered a high traffic application?
- What web application/site can I look to for comparable traffic?
- Do I need to implement things like memcached, template caching, load server balancing etc...?
I've worked on high traffic applications before, but I was never the architect. So although I'm aware of some (not all) strategies for managing high-traffic scenarios, I'm not familiar with their actual implementation.
Can someone offer me advice, feedback, or suggested research? Did I overlook anything?
**Also, I'm building this with LAMP with Smarty.
发布评论
评论(3)
每秒仅为 60kb(假设运行超过 24 小时),但您可能会在高峰时段遇到突发情况,因此您需要能够处理该问题。 即使对于较旧的基于 Apache 的服务器,100 个并发用户也不算什么。
我不确定 memcached 真的能帮到你这么多,但它值得添加,就像 APC 对于你的 PHP cahcing,我至少将其架构为能够负载平衡 - 查看 ultramonkey 对于如何透明地实现这一点的一些好的文档,您需要确保传入的任何用户会话不会以每个主机的方式存储其会话数据; 您需要考虑如果用户在一次呼叫中访问领先平衡服务器 A,然后在另一次呼叫中访问服务器 B,会发生什么情况。 (即将用户 ID 和数据存储在数据库中,而不是文件系统中)。
That's only 60kb per second (assuming running over 24hr), but you'll likely hit bursts during peak hours, so you'll need to be able to handle that. 100 simultaneous users is nothing even for an older Apache-based server.
I'm not sure memcached would really help you out so much, but its worth adding, as is APC for your PHP cahcing, and I would at least architect it to be able to be load balanced - check out ultramonkey for some good documentation on how to get that going transparently, you'll need to ensure that any user session that comes in doesn't store its session data in a per-host way; you need to consider what happens is a user hits lead-balanced server A in one call, and then hits server B in another. (ie store user id and data in the DB, not in the filesystem).
在深入研究核心服务器端内容(负载平衡和 memcached)之前,请确保您理解并实施 YSlow 的所有(或大部分)规则:http://developer.yahoo.com/yslow/help/
然后,如果 MySQL 是瓶颈,请获取 高性能 MySQL 或在 www.mysqlperformanceblog.com。
每天 5GB 并不算多。
Before diving into the hardcore server side stuff (load balancing and memcached) make sure you understand and implement all (or most) rules from YSlow: http://developer.yahoo.com/yslow/help/
Then, if MySQL is a bottleneck, get a copy of High Performance MySQL or simply read on how to tune your queries/design at www.mysqlperformanceblog.com.
5GB per day is not that much.
在 100Mbps 链路上,您
每天最多可以传输 5GB 大约占其中的 0.5%,因此我认为您不必关心如此规模的流量,因为在此之前有很多事情可能会成为您的瓶颈(JavaScript 、数据库访问等...)。 此外,一旦您知道自己有足够可用带宽,您在编写(和基准测试)应用程序之前就不应该关心这些事情,因为这可能会导致过早优化。
我发现 Django 书中的 缩放 部分作为一般内容很有趣该领域的知识。
On a 100Mbps link you can transfer up to
5GB/day represents roughly 0.5% of that so I don't think you have to care about the traffic at such scales, as there are so much things likely to be your bottleneck before that (JavaScript, database access, etc...). Furthermore, once you know you have enough bandwidth available, you should not care about such things before writing (and benchmarking) your app, as this may lead to premature optimization.
I found the scaling section in the Django book interesting as a general piece of knowledge in this field.