我是网络开发领域的新手,目前正在面试公司,人们最喜欢问的问题是:
如果需要,您如何扩展您的网络服务器
开始达到一百万个查询?
如果你只有一个你会做什么
正在运行的数据库实例
时间?你如何做到这一点?
这些问题真的很有趣,我想了解它们。
请针对此类场景提出您的建议/做法(您遵循的)
谢谢
I am new to the area of web development and currently interviewing companies, the most favorite questions among what people ask is:
How do you scale your webserver if it
starts hitting a million queries?
What would you do if you have just one
database instance running at that
time? how do you manage that?
These questions are really interesting and I would like to learn about them.
Please pour in your suggestions / practices (that you follow) for such scenarios
Thank you
发布评论
评论(6)
如何扩展:
典型的扩展选项:
数据库扩展选项:
How to scale:
Typical Scaling Options:
Database Scaling Options:
在最基本的层面上,扩展 Web 服务器包括以可以在 > 上运行的方式编写应用程序。一台机器,然后投入更多机器来解决问题。无论您如何调整它们,最终的扩展都将涉及 Web 服务器群。
数据库问题更难处理。您的读/写百分比是多少?这是一个什么样的应用程序?联机事务处理?联机分析处理?社交媒体?数据库是什么?我们如何添加更多服务器来处理负载?我们是否将数据划分到多个数据库中?或者将所有更改复制到大量从属设备上?
您的问题引发了更多问题,即在面试中,如果某人只是对您发布的一般问题“有答案”,那么他们只知道一种做事方式,而这种方式可能是也可能不是最好的方式。
At the most basic level, scaling web servers consists of writing your app in such a way that it can run on > 1 machine, and throwing more machines at the problem. No matter how much you tune them, the eventual scaling will involve a farm of web servers.
The database issue is way more sticky to deal with. What is your read / write percentage? What kind of application is this? OLTP? OLAP? Social Media? What is the database? How do we add more servers to handle the load? Do we partition our data across multiple dbs? Or replicate all changes to loads of slaves?
Your questions call more questions, i.e. in an interview, if someone just "has the answer" to a generic question like you've posted, then they only know one way of doing things, and that way may or may not be the best one.
对于第一个问题,我会采取几种方法:
是否有硬件升级可以在短时间内处理数百万个查询?如果是这样,这可能是调查的初始点。
是否可以进行软件更改来优化服务器的性能?我知道 IIS 有大量不同的设置,可用于在一定程度上提高性能。
考虑进入网络场而不是使用单个服务器。事实上,我曾经工作过一次,我们每分钟有数百万次点击,这对我们的网络服务器造成了相当严重的打击,并导致许多网站瘫痪。我们的解决方案是更改负载平衡器,以便一些服务器提供会破坏服务器的站点,以便其他服务器可以保持其他站点正常运行,因为这是在秋季,而在零售业,这是您的大季度。虽然有些人会从这里开始,但我可能会最后来到这里,因为与其他两个选项相比,这可能会引发一些蠕虫。
至于数据库实例,在我看来,这将是一组类似的选项,尽管我可能会首先选择多服务器选项,因为冗余可能是一个重要的附带好处,但我不确定使用 Web 服务器是否那么容易。我可能还很遥远,但这就是我最初解决这个问题的方式。
There are a few approaches I'd take to the first question:
Are there hardware upgrades that may get things up enough to handle the million queries in a short time? If so, this is likely an initial point to investigate.
Are there software changes that could be made to optimize the performance of the server? I know IIS has a ton of different settings that could be used to improve performance to some extent.
Consider going into a web farm situation rather than use a single server. I actually did have a situation where I worked once where we did have millions of hits a minute and it was thrashing our web servers rather badly and taking down a number of sites. Our solution was to change the load balancer so that a few of the servers served up the site that would thrash the servers so that other servers could keep the other sites up as this was in the fall and in retail this is your big quarter. While some would start here, I'd likely come here last as this can be opening a bit can of worms compared to the other two options.
As for the database instance, it would be a similar set of options to my mind though I may do the multi-server option first as redundancy may be an important side benefit here that I'm not sure it is as easy with a web server. I may be way off, but that is how I'd initially tackle this.
使用缓存代理
如果您向所有访问者提供相同的页面(例如新闻网站),您可以通过使用缓存代理(例如 清漆 或 Apache 流量服务器。
代理将位于您的服务器和访问者之间。如果您的首页获得 10,000 次点击,则只需生成一次,代理将向其他 9999 名访问者发送相同的响应,而无需再次询问您的应用服务器。
Use a caching proxy
If you serve identical pages to all visitors (say, a news site) you can reduce load by an order of magnitude by caching generated content with a caching proxy such as Varnish or Apache Traffic Server.
The proxy will sit between your server and your visitors. If you get 10,000 hits to your front page it will only have to be generated once, the proxy will send the same response to the other 9999 visitors without asking your app server again.
可能在开发人员开始开发系统之前,
他们会考虑服务器的规格
也许你可以减少搜索引擎优化的使用并阻止搜索引擎抓取它
(这是占用大量资源的任务)
尝试对所有内容建立索引并避免轻松进行搜索
probably before developer starting to develop the system,
they will consider the specification of the server
maybe you can decrease use of SEO and block it from search engine to craw it
(which is the task that taking a lot of resource)
try to index everything well and avoid to making search easily
将其部署在云上,确保您的 Web 服务器和 Web 应用程序云已准备就绪并且可以跨不同节点进行扩展。我推荐 cherokee Web 服务器(非常容易在不同服务器之间进行负载平衡,并且基准测试证明比 Apache 更快)。例如,谷歌云(appspot)需要您的Web应用程序是Python或Java
使用缓存代理,例如。 nginx。
对于数据库,在一些应该重复的查询上使用 memcache。
如果公司希望数据私有,就构建私有云,在这里,Ubuntu 在完全免费和开源方面做得非常好:http://www.ubuntu.com/cloud/private
Deploy it on the cloud, make sure your web server and webapp cloud ready and can scale across different nodes. I recommend cherokee web server (very easy to load balance across different servers, and benchmarks proves faster than Apache,). For ex, google cloud (appspot) needs your web app to be Python or Java
Use caching proxy eg. Nginx.
For database use memcache on some queries which are suppose to be repeated.
If the company wants data to be private , build a private cloud , Here , Ubuntu is doing very good job at it fully free and opensource : http://www.ubuntu.com/cloud/private