更好的实践:将负载放在 SQL 还是 Web 服务器上?
我是美国一所重点大学的网站管理员。我们的网站收到了大量请求,该网站是我在过去 7 年左右的时间里建立并负责的。我一直在我们的网站中构建越来越复杂的功能,我的做法是尽可能将编程负担放在我们的多处理器 Microsoft SQL 服务器上 - 使用存储过程、视图等,并填充 - IIS Web 服务器上的 PHP、ASP 或 Perl 无法完成的任务。两台服务器都是非常强大且功能强大的机器。由于我已经独自这样做了很长时间,没有与其他人一起集思广益,我很好奇我的方法是否适合我们将来遇到的更高负载的情况。
我的问题是:使用嵌套 SELECT 语句、视图、存储过程和聚合函数将更多负载负担放在 SQL Server 上是否更好,或者我应该使用服务器端编译来提取多个更简单的查询并通过它们进行处理?像 PHP 这样的时间脚本?继续坚持还是想出更好的办法?
最近,在进行了一些负载跟踪并了解到我在 SQL 服务器的肩上承担了多少责任之后,我对性能变得更加感兴趣。 Web 服务器和 SQL 服务器全天都快速且响应迅速,几乎不管我在它们上投入了多少,但我想做好准备并训练自己并升级我现有的代码优化最佳实践当它变得重要的时候。
感谢您的建议和意见。
I'm the webmaster for a major US university. We have a great deal of requests on our website, which I've built and been in charge of for the last 7 years or so. I've been building ever-more-complex features into our website and it's always been my practice to put as much of the programming burden on our multi-processor Microsoft SQL server as possible - using stored procedures, views, etc, and fill-in what can't be done with PHP, ASP, or Perl from the IIS web server. Both servers are very powerful and capable machines. Since I've been doing this alone for so long without anyone else to brainstorm with, I'm curious if my approach is ideal for even higher load situations we'll have in the future.
My question is: Is it better practice to place more of the load burden on the SQL server using nested SELECT statements, views, stored procedures and aggregate functions, or should I be pulling multiple simpler queries and processing through them using server-side compile-time scripts like PHP? Keep on keepin' on or come up with a better way?
I've recently become more interested in performance after I did some load traces and learned just how much I've been putting on the shoulders of the SQL server. Both the web server and SQL servers are fast and responsive throughout the day, and almost without regard for how much I put on them, but I'd like to be ready and have trained myself and upgraded my existing code optimized best practices in mind by the time it becomes important.
Thanks for your advice and input.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您将堆栈中的每一层在最适合的领域中使用。
如果 WHERE 子句或 GROUP 子句就足够了,那么让数据库服务器发送 1000 行并使用 PHP 来过滤它们是没有用的。调用数据库来添加两个整数并不是最佳选择(
SELECT 5+9
工作正常,但 php 可以自己完成,并且您可以保存往返)。您可能想要研究可扩展性:应用程序的哪些部分可以划分为多个进程?如果您仍然只使用 2 层(脚本和数据库),那么那里还有很大的扩展空间。但始终首先从瓶颈开始。
一些示例:在 CDN 上托管静态内容、对页面使用缓存、了解 nginx 和 memcached、使用 nosql (mongoDB)、考虑分片、考虑复制。
You put each layer in your stack to use in the domain it fits best.
There is no use in having your database server send 1000 rows and using PHP to filter them if a WHERE-clause or GROUP-clause would suffice. It's not optimal to call the database to add two integers (
SELECT 5+9
works fine, but php can do it itself, and you save the roundtrip).You will probably want to look into scalability: what parts of your application can be divided unto multiple processes? If you're still just using 2 layers (script & db), there is a lot of room for scaling there. But always start with the bottleneck first.
Some examples: host static contents on CDN, use caching for your pages, read about nginx and memcached, use nosql (mongoDB), consider sharding, consider replication.
我的观点是,通常(大部分)最好是让 Web 服务器进行处理。两点:
首先是可扩展性。一旦您的应用程序获得足够的使用量,您就需要开始担心负载平衡。与建立分布式数据库集群相比,添加几个指向公共数据库的额外 Web 服务器要容易得多。因此,最好尽可能减轻数据库的压力,并将其尽可能长时间地保留在一台计算机上。
我想说的第二点是关于优化查询。这在很大程度上取决于您正在使用的查询和数据库后端。当我第一次开始使用数据库时,我陷入了使用多个 JOIN 进行复杂的 SQL 查询的陷阱,这些查询准确地获取了我想要的数据,即使它来自四五个不同的表。我推断“这就是数据库的用途 - 让它来完成艰苦的工作”
我很快发现这些查询执行时间太长,并且通常最终会阻止数据库接收其他请求。虽然将查询拆分为多个请求(例如在 for 循环中)可能效率很低,但您经常会发现使用快速索引执行多个小型查询将使您的应用程序运行比尝试通过所有艰苦的工作更加顺利到数据库
My opinion is that it's generally (mostly) best to favor letting the web servers do the processing. Two points:
First is scalability. Once your application gets enough usage, you'll need to start worrying about load balancing. And it's a lot easier to drop in a couple of extra web servers pointing to a common database than it is to set up a distributed database cluster. So best to take as much strain away from the Database as you can and keep it on a single machine for as long as possible.
The second point i'd like to make is about optimizing the queries. This will depend a lot on the queries you are using, and the database backend. When i first started working with databases, i fell into the trap of making elaborate SQL queries with multiple JOINs that fetched exactly the data i wanted, even if it was from four or five different tables. I reasoned that "That's what the database is there for - lets get it to do the hard work"
I quickly found that these queries took way too long to execute, and often ended up blocking the database from other requests. While it may seam inefficient to split your query into multiple requests (for example in a for loop), you'll often find that executing multiple small queries with fast indexes will make your application run far more smoothly than trying to pass all the hard work to the database
首先,您可能想要检查是否有任何负载可以通过客户端缓存(.js、.css、静态 HTML 和图像)完全删除,并使用 AJAX 等技术来进行屏幕的部分更新 - 这将删除 Web 和 SQL 服务器上的负载。
其次,看看是否存在可以通过 Web 服务器缓存来减少的 sql 负载 - 例如静态或低刷新数据 - 如果您的系统上有很多“内容”页面,请查看常见的 CMS 缓存技术,这些技术将扩展到允许更多用户查看相同的数据,而无需重建页面或访问数据库。
Firstly, you might want to check if there is any load which can be removed entirely by client side caching (.js, .css, static HTML and images), and use of technologies such as AJAX to do partial updates of screens - this will remove load on both web and sql servers.
Secondly, see if there is sql load which can be reduced by web server caching - e.g. static or low refresh data - if you have a lot of 'content' pages on your systems, have a look at common CMS caching techniques which will scale to allow many more users to view the same data without rebuilding the page or hitting the database.
我倾向于在数据库之外尽可能多地进行操作,将数据库调用视为昂贵/耗时的。
例如,当对具有 name_given 和 name_family 字段的用户表执行选择时,我可以使查询变胖以返回通过串联构建的名为 full_name 的列。但这种事情可以在服务器端脚本语言(PHP、Ruby 等)的模型中轻松完成。
当然,在某些情况下,数据库是执行操作的更“自然”的地方。但是,总的来说,我更倾向于将负载放在网络服务器上,并使用其他答案中提到的许多技术进行优化。
I tend to do as much as possible outside the db, viewing db calls as expensive/time-intensive.
For example, when performing a select on a user table with fields name_given and name_family, I could fatten the query to return a column called full_name built by concatenation. But that kind of thing can be easily done in a model on your server-side scripting language (PHP, Ruby, etc).
Of course, there are cases when the db is the more "natural" place to perform an operation. But, in general, I incline more towards putting the load on the web server and optimize there with many of the techniques noted in other answers.