在你回答这个问题之前,我从未开发过任何足够流行的东西来达到高服务器负载。 把我当作(叹气)一个刚刚登陆地球的外星人,尽管我了解 PHP 和一些优化技术。
我正在使用 PHP 开发一个工具,如果运行良好的话,它可以吸引大量用户。 然而,虽然我完全有能力开发该程序,但在制作可以处理巨大流量的东西时,我几乎一无所知。 所以这里有几个关于它的问题(也可以随意将这个问题变成资源线程)。
数据库
目前我计划使用 PHP5 中的 MySQLi 功能。 但是,我应该如何设置与用户和内容相关的数据库? 我真的需要多个数据库吗? 目前,所有内容都混杂在一个数据库中 - 尽管我一直在考虑将用户数据传播到一个数据库,将实际内容传播到另一个数据库,最后将核心网站内容(模板母版等)传播到另一个数据库。 我的理由是,将查询发送到不同的数据库将减轻它们的负载,因为一个数据库 = 3 个负载源。 如果它们都在同一服务器上,这仍然有效吗?
缓存
我有一个模板系统,用于构建页面和交换变量。 主模板存储在数据库中,每次调用模板时,都会调用其缓存副本(html 文档)。 目前,我在这些模板中有两种类型的变量 - 静态变量和动态变量。 静态变量通常是页面名称、站点名称等不经常更改的内容; 动态变量是在每次页面加载时发生变化的东西。
我的问题是:
假设我对不同的文章有评论。 哪个是更好的解决方案:每次加载页面时存储简单的评论模板并呈现评论(来自数据库调用)或将评论页面的缓存副本存储为 html 页面 - 每次添加/编辑/删除评论时该页面被重新缓存。
最后,
有没有人有关于在 PHP 上运行高负载网站的任何提示/指针。 我非常确定这是一种可行的语言 - Facebook 和 Yahoo! 给予它极大的优先权 - 但有什么我应该注意的经历吗?
Before you answer this I have never developed anything popular enough to attain high server loads. Treat me as (sigh) an alien that has just landed on the planet, albeit one that knows PHP and a few optimisation techniques.
I'm developing a tool in PHP that could attain quite a lot of users, if it works out right. However while I'm fully capable of developing the program I'm pretty much clueless when it comes to making something that can deal with huge traffic. So here's a few questions on it (feel free to turn this question into a resource thread as well).
Databases
At the moment I plan to use the MySQLi features in PHP5. However how should I setup the databases in relation to users and content? Do I actually need multiple databases? At the moment everything's jumbled into one database - although I've been considering spreading user data to one, actual content to another and finally core site content (template masters etc.) to another. My reasoning behind this is that sending queries to different databases will ease up the load on them as one database = 3 load sources. Also would this still be effective if they were all on the same server?
Caching
I have a template system that is used to build the pages and swap out variables. Master templates are stored in the database and each time a template is called its cached copy (a html document) is called. At the moment I have two types of variable in these templates - a static var and a dynamic var. Static vars are usually things like page names, the name of the site - things that don't change often; dynamic vars are things that change on each page load.
My question on this:
Say I have comments on different articles. Which is a better solution: store the simple comment template and render comments (from a DB call) each time the page is loaded or store a cached copy of the comments page as a html page - each time a comment is added/edited/deleted the page is recached.
Finally
Does anyone have any tips/pointers for running a high load site on PHP. I'm pretty sure it's a workable language to use - Facebook and Yahoo! give it great precedence - but are there any experiences I should watch out for?
发布评论
评论(24)
通过 EXPLAIN 测试每个查询并检查索引,以避免对表进行大量扫描。
缓存静态数据,而不是每次运行查询,例如菜单/计数/其他小部件。 (Memcache/d、其他缓存)
避免在项目中使用高权重的 CMS,如 Drupal、Wordpres...必须在框架中构建项目,例如 CodeIgniter、Laravel
使用混淆和压缩将多个 CSS 文件编译为一个大文件。 对于大型项目使用CDN,加载静态内容,删除服务器的加载
最新 php 版本 8.1/8.2,带 SSD 的服务器,可以通过 CloudFlare 进行保护,以防止 DDOS 攻击,slowloris apache mod。
Test each query over EXPLAIN and check for indexes, to avoid large scans of tables.
Cache static data, instead to run each time query, such as menus / counts / other widgets. (Memcache/d, other cache)
Avoid to use in your projects high weight CMS as Drupal, Wordpres... Have to build projects in frameworks, such as CodeIgniter, Laravel
Compile multiple CSS file to one large, using obfuscation and compression. for large projects use CDN, to load static content, to remove loading of the server
Latest php version 8.1/8.2, server with ssd, can be protected over CloudFlare, to protect from DDOS attacks, slowloris apache mod.
关于缓存的观点是正确的; 它是构建高效应用程序中最不复杂且最重要的部分。 我想补充一点,虽然 memcached 很棒,但如果您的应用程序位于单个服务器上,APC 的速度大约是原来的五倍。
MySQL 性能博客上的“缓存性能比较”帖子对此主题有一些有趣的基准 - http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/。
The points made about cache are spot-on; it is the least complicated and most important part of building an efficient application. I'd like to add that while memcached is great, APC is about five times faster if your application lives on a single server.
The "Cache Performance Comparison" post at the MySQL performance blog has some interesting benchmarks on the subject - http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/.
如果您正在处理大量数据,并且缓存无法解决问题,请考虑使用 Sphinx。 我们使用 SphinxSearch 取得了很好的成果,不仅可以实现更好的文本搜索,而且在处理较大表时可以作为 MySQL 的数据检索替代品。 如果您使用 SphinxSE(MySQL 插件),它的性能提升比我们通过缓存获得的性能提升好几倍,而且应用程序实现也轻而易举。
If you are working with large amounts of data, and caching isn't cutting it, look into Sphinx. We've had great results with using SphinxSearch not only for better text searching, but also as a data retrieval replacement for MySQL when dealing larger tables. If you use SphinxSE (MySQL plugin), it surpassed our performance gains we had from caching several times over, and application-implementation is a sinch.
APC 是绝对必须的。 它不仅构成了一个出色的缓存系统,而且自动缓存 PHP 文件的收益也是天赐之物。 至于多数据库的想法,我认为在同一服务器上拥有不同的数据库不会带来太多好处。 它可能会在查询期间给您带来一些速度上的提升,但我怀疑部署和维护这三个代码同时确保它们同步所付出的努力是否值得。
我还强烈建议运行 Xdebug 来查找程序中的瓶颈。 它使优化对我来说变得轻而易举。
APC is an absolute must. Not only does it make for a great caching system, but the gain from the auto-cached PHP files is a godsend. As for the multiple database idea, I don't think you would get much out of having different databases on the same server. It may give you a bit of a gain in speed during query time, but I doubt the effort it would take to deploy and maintain the code for all three while making sure they are in sync would be worth it.
I also highly recommend running Xdebug to find bottlenecks in your program. It made optimization a breeze for me.
我的第一条建议是在设计网站时考虑这个问题并牢记在心,但不要太过分。 通常很难预测新网站的成功,我认为您的时间最好花在早起完成并稍后优化上。
一般来说,简单就是快速。
模板会降低你的速度。 数据库会减慢你的速度。 复杂的库会减慢你的速度。 将模板相互分层,从数据库中检索它们并在复杂的库中对其进行解析 --> 时间延迟相互乘积。
一旦您建立并运行了基本站点,进行测试,以告诉您应该在哪里花费精力。 很难看出要瞄准哪里。 通常,为了加快速度,您必须解开代码的复杂性,这使得代码变得更大且更难以维护,因此您只想在必要时这样做。
根据我的经验,建立数据库连接的成本相对较高。 如果可以的话,请不要在访问量最大的页面(例如网站的首页)上为普通访问者连接到数据库。 创建多个数据库连接是疯狂的,而且几乎没有什么好处。
My first piece of advice is to think about this issue and keep it in mind when designing the site but don't go overboard. It's often difficult to predict the success of a new site and I your time will be better spent getting up finished early and optimising it later.
In general, Simple is fast.
Templates slow you down. Databases slow you down. Complex libraries slow you down. Layering templates over each other retrieving them from databases and parsing it in a complex library --> the time delays multiply with each other.
Once you have the basic site up and running do tests to show you where to spend your efforts. It's difficult to see where to target. Often to speed things up you will have to unravel the complexity of the code, this makes it larger and harder to maintain, so you only want to do it where necessary.
In my experience establishing the database connection was relatively expensive. If you can get away with it, don't connect to the database for general visitors on the most trafficed pages like the front page to the site. Creating multiple database connections is madness with very little benefit.
PDO 也很慢,而且它的 API 也相当复杂。 如果不考虑可移植性,任何头脑清醒的人都不应该使用它。 让我们面对现实吧,99% 的网络应用程序都不是这样。 您只需坚持使用 MySQL 或 PostrgreSQL,或者您正在使用的任何东西。
至于PHP的问题以及需要考虑什么。 我认为过早的优化是万恶之源。 ;) 首先完成你的应用程序,在编程时尽量保持干净,做一些文档并编写单元测试。 有了上述所有内容,到时候重构代码就不会有任何问题了。 但首先你要完成并推出它,看看人们对此有何反应。
PDO is also very slow and its API is pretty complicated. No one in their sane mind should use it if portability is not a concern. And let's face it, in 99% of all webapps it is not. You just stick with MySQL or PostrgreSQL, or whatever it is you are working with.
As for the PHP question and what to take into account. I think premature optimization is the root of all evil. ;) Get your application done first, try to keep it clean when it comes to programming, do a little documentation and write unit tests. With all of the above you will have no issues refactoring code when the time comes. But first you want to be done and push it out to see how people react to it.
看起来我错了。 MySQLi 仍在开发中。 但根据该文章,PDO_MySQL 现在正由 MySQL 团队贡献。 来自文章:
在我看来,这篇文章似乎偏向 MySQLi。 我想我对PDO有偏见。
与 MySQLi 相比,我真的更喜欢 PDO。 这对我来说很简单。 该 API 与我编写的其他语言更接近。OO 数据库接口似乎工作得更好。
我还没有遇到过任何无法通过 PDO 获得的特定 MySQL 功能。 如果我曾经这样做过,我会感到惊讶。
It looks like I was wrong. MySQLi is still being developed. But according to the article, PDO_MySQL is now being contributed to by the MySQL team. From the article:
To me, it seems the article is biased towards MySQLi. I suppose I'm biased towards PDO.
I really like PDO over MySQLi. It's straight forward to me. The API is a lot closer to other languages I've programmed in. OO Database interfaces seem to work better.
I haven't come across any specific MySQL features that weren't available through PDO. I would be surprised if I ever did.
实际上,许多人确实一起使用 APC 和 memcached...
Actually, many do use APC and memcached together...
我运营的网站每月浏览量为 7-800 万次。 不是很多,但足以让我们的服务器感受到负载。 我们选择的解决方案很简单:数据库级别的 Memcache。 如果数据库负载是您的主要问题,则此解决方案非常有效。
我们开始使用 Memcache 来缓存整个对象和最常用的数据库结果。 它确实有效,但也引入了错误(如果我们更加小心的话,我们可能可以避免其中一些错误)。
所以我们改变了我们的方法。 我们构建了一个数据库包装器(与我们的旧数据库具有完全相同的方法,因此很容易切换),然后我们对其进行子类化以提供 memcached 数据库访问方法。
现在您所要做的就是决定查询是否可以使用缓存的(可能是过期的)结果。 用户运行的大多数查询现在直接从 Memcache 获取。 更新和插入是例外,对于主网站而言,更新和插入仅因日志记录而发生。 这个相当简单的措施将我们的服务器负载减少了大约 80%。
I run a website with 7-8 million page views a month. Not terribly much, but enough that our server felt the load. The solution we chose was simple: Memcache at the database level. This solution works well if the database load is your main problem.
We started out using Memcache to cache entire objects and the database results that were most frequently used. It did work, but it also introduced bugs (we might have avoided some of those if we had been more careful).
So we changed our approach. We built a database wrapper (with the exact same methods as our old database, so it was easy to switch), and then we subclassed it to provide memcached database access methods.
Now all you have to do is decide whether a query can use cached (and possibly out of date) results or not. Most of the queries run by the users are now fetched directly from Memcache. The exceptions are updates and inserts, which for the main website only happens because of logging. This rather simple measure reduced our server load by about 80%.
使用 Xdebug(如推荐的 tj9991)之类的工具来分析你的应用程序肯定是必须的。 盲目地优化事情并没有多大意义。 Xdebug 将帮助您找到代码中的真正瓶颈,以便您可以明智地花费优化时间并修复实际上导致速度变慢的代码块。
如果您使用 Apache,另一个可以帮助测试的实用程序是 Siege。 通过实际测试,它将帮助您预测服务器和应用程序对高负载的反应。
任何类型的 PHP 操作码缓存(例如 APC 或许多其他缓存之一)也会有很大帮助。
Profiling your app with something like Xdebug (like tj9991 recommended) is definitely going to be a must. It doesn't make a whole lot of sense to just go around optimizing things blindly. Xdebug will help you find the real bottlenecks in your code so you can spend your optimization time wisely and fix chunks of code that are actually causing slow downs.
If you're using Apache, another utility that can help in testing is Siege. It will help you anticipate how your server and application will react to high loads by really putting it through its paces.
Any kind of opcode cache for PHP (like APC or one of the many others) will help a lot as well.
我曾在一些由 PHP 和 PHP 提供支持的网站上工作过,这些网站每月的点击量达到数百万次。 MySQL。 这里有一些基础知识:
我建议阅读构建可扩展的网站,它是由 Flickr 之一编写的工程师,是一个很好的参考。
也请查看我关于可扩展性的博客文章,其中有很多有关使用多种语言和平台进行扩展的演示文稿的链接:
http://www.ryandoherty.net/2008/07/13/独角兽和可扩展性/
I've worked on a few sites that get millions/hits/month backed by PHP & MySQL. Here are some basics:
I'd recommend reading Building Scalable Websites, it was written by one of the Flickr engineers and is a great reference.
Check out my blog post about scalability too, it has a lot of links to presentations about scaling with multiple languages and platforms:
http://www.ryandoherty.net/2008/07/13/unicorns-and-scalability/
没有两个网站是相同的。 您确实需要像 jmeter 这样的工具和基准测试来查看问题点在哪里。 您可以花费大量时间猜测和改进,但只有在衡量和比较您的更改之前,您才会看到真正的结果。
例如,多年来,MySQL 查询缓存是我们所有性能问题的解决方案。 如果您的网站速度很慢,MySQL 专家建议打开查询缓存。 事实证明,如果写入负载很高,缓存实际上会严重损坏。 如果你在没有测试的情况下打开它,你永远不会知道。
并且不要忘记,你永远不会完成扩展。 处理 10req/s 的站点需要进行更改才能支持 1000req/s。 如果您足够幸运,需要支持 10,000req/s,您的架构也可能看起来完全不同。
数据库
缓存
No two sites are alike. You really need to get a tool like jmeter and benchmark to see where your problem points will be. You can spend a lot of time guessing and improving, but you won't see real results until you measure and compare your changes.
For example, for many years, the MySQL query cache was the solution to all of our performance problems. If your site was slow, MySQL experts suggested turning the query cache on. It turns out that if you have a high write load, the cache is actually crippling. If you turned it on without testing, you'd never know.
And don't forget that you are never done scaling. A site that handles 10req/s will need changes to support 1000req/s. And if you're lucking enough to need to support 10,000req/s, your architecture will probably look completely different as well.
Databases
Caching
我不认为自己会很快从 MySQL 切换 - 所以我想我不需要 PDO 的抽象功能。 感谢 DavidM 的这些文章,他们对我帮助很大。
I don't see myself switching from MySQL anytime soon - so I guess I don't need the abstraction capabilities of PDO. Thanks for those articles DavidM, they've helped me a lot.
第一个问题是你真正期望它有多大? 您计划在基础设施上投资多少? 既然您觉得有必要在这里问这个问题,我猜您希望在预算有限的情况下从小规模开始。
如果站点不可用,则性能无关紧要。 为了可用性,您需要水平扩展。 您至少可以明智地使用 2 台服务器,都运行 apache、php 和 mysql。 将一个 DBMS 设置为另一个 DBMS 的从属。 在主数据库上执行所有写入操作,并在本地数据库(无论是什么)上执行所有读取操作 - 除非由于某种原因您需要读回刚刚读取的数据(使用主数据库)。 确保你已经有适当的机制来自动提升奴隶并限制主人。 对 Web 服务器地址使用循环 DNS,以便为从节点提供更多亲和力。
在这个阶段将数据分区到不同的数据库节点上是一个非常糟糕的主意 - 但是您可能需要考虑将其拆分到同一服务器上的不同数据库中(当您超越 Facebook 时,这将有助于跨节点分区)。
请确保您拥有适当的监控和数据分析工具来衡量站点性能并识别瓶颈。 大多数性能问题可以通过编写更好的 SQL/修复数据库架构来解决。
将模板缓存保留在数据库上是一个愚蠢的想法 - 数据库应该是结构化数据的中央公共存储库。 将模板缓存保存在网络服务器的本地文件系统上 - 它将更快地可用,并且不会减慢您的数据库访问速度。
请使用操作码缓存。
花大量时间研究您的网站及其日志,以了解为什么速度如此缓慢。
将尽可能多的缓存推送到客户端。
使用 mod_gzip 来压缩所有可以压缩的东西。
C。
First question is how big do you really expect it to be? And how much do you plan on investing in your infrastructure. Since you feel the need to ask the question here, I'm guessing that you expect to start small on a limited budget.
Performance is irrelevant if the site is not available. And for availability you need horizontal scaling. The minimum you can sensibly get away with is 2 servers, both running apache, php and mysql. Set up one DBMS as a slave to the other. Do all the writes on the master, and all the reads on the local database (whatever that is) - unless for some reason you need to read back the data you've just read (use master). Make sure you've got the machinery in place to automatically promote the slave and fence the master. Use round-robin DNS for the webserver addresses to give more affinity for the slave node.
Partitioning your data across different database nodes at this stage is a very bad idea - however you might want to consider splitting it across different databases on the same server (which will facilitate partitioning across nodes when you overtake facebook).
Do make sure you've got the monitoring and data analysis tools in place to measure your sites performance and identify bottlenecks. Most performance problems can be fixed by writing better SQL / fixing the database schema.
Keeping your template cache on the database is a dumb idea - the database should be a central common repository for structured data. Keep your template cache on the local filesystem of your webservers - it will be available faster and won't slow down your database access.
Do use a op-code cache.
Spend plenty of time studying your site and its logs to understand why its going so slow.
Push as much caching as possible onto the client.
Use mod_gzip to compress everything you can.
C.
已经给出了很多好的答案,但我想向您指出一个名为 XCache 的替代操作码缓存。 它是由一位轻量级贡献者创建的。
另外,如果您将来可能需要对数据库服务器进行负载平衡,MySQL 代理 可以很好地帮助您实现这一目标。
这两个工具都应该很容易插入现有的应用程序,因此可以在需要时完成此优化,而无需太多麻烦。
A lot of good answers were given already, but I would like to point you to an alternate opcode cache called XCache. It is created by a lighty contributor.
Also, if you may need load balancing your database server in future, MySQL Proxy could very well help you to achieve this.
Both of those tools should plug into an existing application quite easily, so this optimization can be done when you need it, without too much hassle.
当然 pdo 很好,但是有 曾经 关于它相对于 mysql 和 mysqli 的性能存在一些争议,尽管现在似乎已修复。
如果您考虑可移植性,则应该使用 pdo,但如果没有,则应该使用 mysqli。 它有一个 OO 接口、准备好的语句以及 pdo 提供的大部分内容(除了可移植性)。
另外,如果确实需要性能,请准备(本机mysql)MysqLnd PHP 5.3 中的驱动程序,它将与 php 更紧密地集成,具有更好的性能和改进的内存使用(以及性能调整的统计信息)。
如果您有集群服务器(和类似 YouTube 的负载),Memcache 会很好,但我会尝试 APC 首先也是。
Sure pdo is nice, but there has been some controversy about it's performance versus mysql and mysqli, although it seems fixed now.
You should use pdo if you envision portability, but if not, mysqli should be the way. It has an OO interface, prepared statements, and most of what pdo offers (except, well, portability).
Plus, if performance is really needed, prepare for the (native mysql) MysqLnd driver in PHP 5.3, who will be much more tightly integrated with php, with better performance and improved memory usage (and statistics for performance tuning).
Memcache is nice if you have clustered servers (and YouTube-like load), but i'd try out APC first too.
常规
代码
数据库
缓存
其他
General
Code
Databases
Caching
Miscellaneous
我不敢相信没有人提到这一点:模块化和抽象。 如果您认为您的站点必须扩展到容纳大量机器,那么您必须对其进行设计,以便能够实现! 这意味着愚蠢的事情,比如不假设数据库位于本地主机上。 它还意味着一开始会很麻烦的事情,比如编写数据库抽象层(如 PDO,但要轻得多,因为它只做您需要它做的事情)。
这意味着使用框架之类的事情。 您的代码需要分层,以便稍后可以通过重构数据抽象层来获得性能,例如,通过教导它某些对象位于不同的数据库中 - 并且代码不必知道或关心。。
最后,要小心内存密集型操作,例如不必要的字符串复制。 如果您可以降低 PHP 的内存使用量,那么您将从 Web 服务器中获得更高的性能,并且当您采用负载平衡解决方案时,这将会扩展。
I can't believe no-one has already mentioned this: Modularisation and Abstraction. If you think your site is going to have to grow to lots of machines, you must design it so it can! That means stupid things like don't assume the database is on localhost. It also means things that are going to be a bother at first, like writing a database abstraction layer (like PDO, but much much lighter because it only does what you need it to do).
And it means things like working with a framework. You will need layers to your code so that you can later gain performance by refactoring the data-abstraction layer, for example, by teaching it that some objects are in a different database -- and the code doesn't have to know or care.
Finally, be careful of memory-intensive operations, for example, unnecessary string copying. If you can keep PHP's memory usage down, then you will get more performance out of your webserver and this is something that will scale when you go to a load-balanced solution.
查看 mod_cache,Apache Web 服务器的输出缓存,类似到 ASP.NET 中的输出缓存。
是的,我可以看到它仍然处于实验阶段,但总有一天会成为最终版本。
Look into mod_cache, an output cache for the Apache web server, simillar to the output caching in ASP.NET.
Yes, I can see that it's still experimental but it will be final someday.
@加里
我现在正在研究 PDO,看起来你是对的 - 但我知道 MySQL 正在为 PHP 开发 MySQLd 扩展 - 我认为要成功 MySQL 或 MySQLi - 你对此有何看法?
@瑞安,埃里克,tj9991
感谢您对 PHP 缓存扩展的建议 - 您能否解释一下使用其中一个扩展的原因? 我通过 IRC 听说过有关 memcached 的精彩内容,但从未听说过 APC - 您对它们有何看法? 我认为使用多个缓存系统是相当适得其反的。
我肯定会挑选一些分析测试人员 - 非常感谢您对这些测试人员的建议。
@Gary
I'm loking over PDO at the moment and it looks like you're right - however I know that MySQL are developing the MySQLd extension for PHP - I think to succeed either MySQL or MySQLi - what do you think about that?
@Ryan, Eric, tj9991
Thanks for the advice on PHP's caching extensions - could you explain reasons for using one over another? I've heard great things about memcached through IRC but have never heard of APC - what are your opinions on them? I assume using multiple caching systems is pretty counter-effective.
I will definitely be sorting out some profiling testers - thank you very much for your recommendations on those.
我是一个拥有超过 1500 万用户的网站的首席开发人员。 我们很少遇到扩展问题,因为我们很早就计划并深思熟虑地进行了扩展。 以下是我根据我的经验可以建议的一些策略。
架构
首先,对您的模式进行非规范化。 这意味着您不应该选择拥有多个关系表,而应该选择拥有一个大表。 一般来说,连接会浪费宝贵的数据库资源,因为进行多次准备和排序会消耗磁盘 I/O。 尽可能避免它们。
这里的权衡是您将存储/提取冗余数据,但这是可以接受的,因为数据和笼内带宽非常便宜(更大的磁盘),而多个准备 I/O 的成本要高几个数量级(更多服务器) 。
索引
确保您的查询至少使用一个索引。 但请注意,如果您频繁写入或更新,索引将会花费您的费用。 有一些实验技巧可以避免这种情况。
您可以尝试添加未索引的其他列,这些列与已索引的列并行运行。 然后,您可以有一个离线进程,将非索引列批量写入索引列。 这样,您可以更好地控制 mySQL 何时需要重新计算索引。
像瘟疫一样避免计算查询。 如果必须计算查询,请尝试在写入时执行一次。
缓存
我强烈推荐 Memcached。 它已经被 PHP 堆栈上最大的参与者 (Facebook) 证明并且非常灵活。 有两种方法可以做到这一点,一种是在数据库层缓存,另一种是在业务逻辑层缓存。
DB 层选项需要缓存从数据库检索的查询结果。 您可以使用 md5() 对 SQL 查询进行哈希处理,并在访问数据库之前将其用作查找键。 这样做的好处是它很容易实现。 缺点(取决于实现)是您失去了灵活性,因为您在缓存过期方面对所有缓存进行了相同的处理。
在我工作的商店中,我们使用业务层缓存,这意味着我们系统中的每个具体类都控制自己的缓存模式和缓存超时。 这对我们来说效果很好,但请注意,从数据库检索的项目可能与从缓存中检索的项目不同,因此您必须同时更新缓存和数据库。
数据分片
复制只能让你到目前为止。 比您预期的更快,您的写入将成为瓶颈。 作为补偿,请确保尽早支持数据分片。 如果你不这样做,你以后可能会想开枪自杀。
实施起来非常简单。 基本上,您希望将密钥权限与数据存储分开。 使用全局数据库来存储主键和集群 ID 之间的映射。 您查询此映射以获取集群,然后查询集群以获取数据。 您可以缓存此查找操作,这将使其成为可以忽略不计的操作。
这样做的缺点是可能很难将多个分片的数据拼凑在一起。 但是,您也可以设计自己的方法来解决这个问题。
离线处理
如果不需要,不要让用户等待您的后端。 构建作业队列并移动任何可以离线处理的处理,将其与用户的请求分开进行。
I'm a lead developer on a site with over 15M users. We have had very little scaling problems because we planned for it EARLY and scaled thoughtfully. Here are some of the strategies I can suggest from my experience.
SCHEMA
First off, denormalize your schemas. This means that rather than to have multiple relational tables, you should instead opt to have one big table. In general, joins are a waste of precious DB resources because doing multiple prepares and collation burns disk I/O's. Avoid them when you can.
The trade-off here is that you will be storing/pulling redundant data, but this is acceptable because data and intra-cage bandwidth is very cheap (bigger disks) whereas multiple prepare I/O's are orders of magnitude more expensive (more servers).
INDEXING
Make sure that your queries utilize at least one index. Beware though, that indexes will cost you if you write or update frequently. There are some experimental tricks to avoid this.
You can try adding additional columns that aren't indexed which run parallel to your columns that are indexed. Then you can have an offline process that writes the non-indexed columns over the indexed columns in batches. This way, you can control better when mySQL will need to recompute the index.
Avoid computed queries like a plague. If you must compute a query, try to do this once at write time.
CACHING
I highly recommend Memcached. It has been proven by the biggest players on the PHP stack (Facebook) and is very flexible. There are two methods to doing this, one is caching in your DB layer, the other is caching in your business logic layer.
The DB layer option would require caching the result of queries retrieved from the DB. You can hash your SQL query using md5() and use that as a lookup key before going to database. The upside to this is that it is pretty easy to implement. The downside (depending on implementation) is that you lose flexibility because you're treating all caching the same with regard to cache expiration.
In the shop I work in, we use business layer caching, which means each concrete class in our system controls its own caching schema and cache timeouts. This has worked pretty well for us, but be aware that items retrieved from DB may not be the same as items from cache, so you will have to update cache and DB together.
DATA SHARDING
Replication only gets you so far. Sooner than you expect, your writes will become a bottleneck. To compensate, make sure to support data sharding early as possible. You will likely want to shoot yourself later if you don't.
It is pretty simple to implement. Basically, you want to separate the key authority from the data storage. Use a global DB to store a mapping between primary keys and cluster ids. You query this mapping to get a cluster, and then query the cluster to get the data. You can cache the hell out of this lookup operation which will make it a negligible operation.
The downside to this is that it may be difficult to piece together data from multiple shards. But, you can engineer your way around that as well.
OFFLINE PROCESSING
Don't make the user wait for your backend if they don't have to. Build a job queue and move any processing that you can offline, doing it separate from the user's request.
就其价值而言,即使没有像 memcached 这样的扩展/帮助程序包,缓存在 PHP 中也是非常简单的。
您所需要做的就是使用 ob_start() 创建一个输出缓冲区。
创建全局缓存功能。 调用ob_start,将该函数作为回调传递。 在该函数中,查找页面的缓存版本。 如果存在,则服务并结束。
如果不存在,脚本将继续处理。 当它到达匹配的 ob_end() 时,它将调用您指定的函数。 那时,您只需获取输出缓冲区的内容,将它们放入文件中,保存文件,然后结束。
添加一些过期/垃圾收集。
许多人没有意识到您可以嵌套
ob_start()
/ob_end()
调用。 因此,如果您已经使用输出缓冲区来解析广告或执行语法突出显示或其他任何操作,则可以嵌套另一个ob_start/ob_end
调用。For what it's worth, caching is DIRT SIMPLE in PHP even without an extension/helper package like memcached.
All you need to do is create an output buffer using
ob_start()
.Create a global cache function. Call
ob_start
, pass the function as a callback. In the function, look for a cached version of the page. If exists, serve it and end.If it doesn't exist, the script will continue processing. When it reaches the matching ob_end() it will call the function you specified. At that time, you just get the contents of the output buffer, drop them in a file, save the file, and end.
Add in some expiration/garbage collection.
And many people don't realize you can nest
ob_start()
/ob_end()
calls. So if you're already using an output buffer to, say, parse in advertisements or do syntax highlighting or whatever, you can just nest anotherob_start/ob_end
call.首先,正如 Knuth 所说,“过早的优化是万恶之源”。 如果您现在不需要处理这些问题,那么就不要这样做,首先专注于交付可以正常工作的东西。 话虽如此,如果优化不能等待。
尝试分析您的数据库查询,找出缓慢的原因以及经常发生的情况,并从中提出优化策略。
我会调查 Memcached 因为它是许多高负载网站用于高效缓存的方法所有类型的内容,并且 PHP 对象接口非常好。
在服务器之间拆分数据库并使用某种负载平衡技术(例如,使用必要的数据生成 1 到 # 个冗余数据库之间的随机数 - 并使用该数字来确定要连接到哪个数据库服务器)也可能是提高性能的绝佳方法。效率。
过去,这些对于一些负载相当高的网站来说都效果很好。 希望这有助于您入门:-)
Firstly, as I think Knuth said, "Premature optimization is the root of all evil". If you don't have to deal with these issues right now then don't, focus on delivering something that works correctly first. That being said, if the optimizations can't wait.
Try profiling your database queries, figure out what's slow and what happens alot and come up with an optimization strategy from that.
I would investigate Memcached as it's what a lot of the higher load sites use for efficiently caching content of all types, and the PHP object interface to it is quite nice.
Splitting up databases among servers and using some sort of load balancing technique (e.g. generate a random number between 1 and # redundant databases with necessary data - and use that number to determine which database server to connect to) can also be an excellent way to increase efficiency.
These have all worked out pretty well in the past for some fairly high load sites. Hope this helps to get you started :-)
回复:PDO / MySQLi / MySQLND
@gary
你不能只是说“不要使用 MySQLi”,因为他们有不同的目标。 PDO 几乎就像一个抽象层(尽管实际上并非如此),其设计目的是使多种数据库产品的使用变得容易,而 MySQLi 则特定于 MySQL 连接。 在与 MySQLi 进行比较时,说 PDO 是现代访问层是错误的,因为您的陈述暗示进展已经是 mysql -> 。 mysqli-> PDO 却不是这样。
MySQLi 和 PDO 之间的选择很简单 - 如果您需要支持多个数据库产品,那么您可以使用 PDO。 如果您只使用 MySQL,那么您可以在 PDO 和 MySQLi 之间进行选择。
那么为什么选择 MySQLi 而不是 PDO? 请参阅下文...
@ross
你关于 MySQLnd 是最新的 MySQL 核心语言级别库的说法是正确的,但它并不是 MySQLi 的替代品。 MySQLi(与 PDO 一样)仍然是您通过 PHP 代码与 MySQL 交互的方式。 两者都使用 libmysql 作为 PHP 代码背后的 C 客户端。 问题在于 libmysql 位于核心 PHP 引擎之外,而这正是 mysqlnd 发挥作用的地方,即它是一个本机驱动程序,它利用核心 PHP 内部机制来最大限度地提高效率,特别是在内存使用方面。
MySQLnd 由 MySQL 自己开发,最近已经登陆 PHP 5.3 分支,该分支正在进行 RC 测试,准备在今年晚些时候发布。 然后您将能够将 MySQLnd 与 MySQLi 一起使用...但不能与 PDO 一起使用。 这将使 MySQLi 在许多领域(不是全部)性能提升,并且将使如果您不需要 PDO 之类的抽象功能,那么它是 MySQL 交互的最佳选择。
也就是说,MySQLnd 现在可用于 PDO 的 PHP 5.3,因此您可以获得从 ND 到 PDO 的性能增强的优点,但是,PDO 仍然是通用数据库层,因此将是 不太可能像 MySQLi 那样从 ND 的增强功能中获益。
可以在这里找到一些有用的基准,尽管它们来自2006。您还需要注意此选项之类的事情。
在 MySQLi 和 PDO 之间做出选择时需要考虑很多因素。 事实上,除非您的请求数量高得离谱,否则这并不重要,在这种情况下,使用专门为 MySQL 设计的扩展比使用抽象事物并恰好提供 MySQL 驱动程序的扩展更有意义。
这不是一个简单的问题,哪个最好,因为每个都有优点和缺点。 您需要阅读我提供的链接并做出自己的决定,然后进行测试并找出答案。 我在过去的项目中使用过 PDO,它是一个很好的扩展,但为了纯粹的性能,我选择 MySQLi 并编译新的 MySQLND 选项(当 PHP 5.3 发布时)。
Re: PDO / MySQLi / MySQLND
@gary
You cannot just say "don't use MySQLi" as they have different goals. PDO is almost like an abstraction layer (although it is not actually) and is designed to make it easy to use multiple database products whereas MySQLi is specific to MySQL conections. It is wrong to say that PDO is the modern access layer in the context of comparing it to MySQLi because your statement implies that the progression has been mysql -> mysqli -> PDO which is not the case.
The choice between MySQLi and PDO is simple - if you need to support multiple database products then you use PDO. If you're just using MySQL then you can choose between PDO and MySQLi.
So why would you choose MySQLi over PDO? See below...
@ross
You are correct about MySQLnd which is the newest MySQL core language level library, however it is not a replacement for MySQLi. MySQLi (as with PDO) remains the way you would interact with MySQL through your PHP code. Both of these use libmysql as the C client behind the PHP code. The problem is that libmysql is outside of the core PHP engine and that is where mysqlnd comes in i.e. it is a Native Driver which makes use of the core PHP internals to maximise efficiency, specifically where memory usage is concerned.
MySQLnd is being developed by MySQL themselves and has recently landed onto the PHP 5.3 branch which is in RC testing, ready for a release later this year. You will then be able to use MySQLnd with MySQLi...but not with PDO. This will give MySQLi a performance boost in many areas (not all) and will make it the best choice for MySQL interaction if you do not need the abstraction like capabilities of PDO.
That said, MySQLnd is now available in PHP 5.3 for PDO and so you can get the advantages of the performance enhancements from ND into PDO, however, PDO is still a generic database layer and so will be unlikely to be able to benefit as much from the enhancements in ND as MySQLi can.
Some useful benchmarks can be found here although they are from 2006. You also need to be aware of things like this option.
There are a lot of considerations that need to be taken into account when deciding between MySQLi and PDO. It reality it is not going to matter until you get to rediculously high request numbers and in that case, it makes more sense to be using an extension that has been specifically designed for MySQL rather than one which abstracts things and happens to provide a MySQL driver.
It is not a simple matter of which is best because each has advantages and disadvantages. You need to read the links I've provided and come up with your own decision, then test it and find out. I have used PDO in past projects and it is a good extension but my choice for pure performance would be MySQLi with the new MySQLND option compiled (when PHP 5.3 is released).