使用 SQL 数据库的高负载 Web 项目
我想问一个有关使用 SQL 数据库开发高负载 Web 应用程序的方法的问题。 假设我们有简单的 CMS 系统,提供对不同文章的访问。我们还希望将文章访问量存储在数据库中。每次用户观看该文章时,该访问计数器就会增加。
就SQL数据库而言,我们在“文章”表中有“访问”整数字段,每次用户访问文章时我们都需要增加该字段。如果文章并发访问量较大,需要正确修改当前数据库行的“visits”字段值。
我使用悲观锁定方法:“SELECT .. FOR UPDATE”。每次,当用户访问某些文章时,我都会锁定“文章”表上的特定行并增加“访问”计数器。
这种做法正确吗?
我在我的项目中使用 MySQL 数据库。
I want to ask a question about approaches in highload web applications development using SQL databases.
Assume we have simple CMS system providing access to different articles. Also we want to store article visits quantity in database. This visits counter increases every time when user watches the article.
In terms of SQL database, we have "visits" integer field in the table "Article" that we need to increase every time when user visits the article. If article has a large number of concurrent visits, it is necessary to correctly modify "visits" field value of current database row.
I use pessimistic locking approach: "SELECT .. FOR UPDATE". Every time, when user visits some article, I make a lock on specific row on "Article" table and increase "visits" counter.
Is this approach correct?
I'm using MySQL database in my projects.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这种方法适用于中等负载的站点,但最终您会遇到锁定问题。
在负载非常高的网站上,您需要实现一个消息队列,并将“已访问”事件发送到队列。
离线进程将读取队列并适当更新列。这将允许在任何时候只有一个进程访问该特定列。
This approach will work for medium load sites, but eventually you will end up with locking problems.
On a very high load website, you'd want to implement a message queue, and send "visited" events to the queue.
A offline process would read the queue and update the columns appropriately. This would allow for just a single process to be accessing that particular column at any time.
我绝对不建议在文章数据中存储聚合数据(例如“访问次数”),而是使用一个单独的表,在其中为每次访问记录一条新记录。在那里存储时间戳、article_id、IP 地址和其他数据。
其背后的基本原理是,您不必为每次访问锁定每篇文章的数据库记录,这可能会导致锁定问题/争用。
现在,当您有兴趣检索视图数时,请在日志表上进行简单的选择。对于中型网站来说应该“足够好”;随着负载的增加,您将必须定期计算每篇文章的浏览次数并缓存浏览计数器以加快对此数据的访问速度。
I would definitely not recommend storing aggregate data (like "number of visits") in the article data, but rather use a separate table where you log one new record for each visit. Store the timestamp, article_id, IP address and other data there.
The rationale behind this is that you will not have to lock the database record for each article for each visit, which would likely result in locking issues / contention.
Now, when you are interested in retrieving the number of views, do a simple select on the log table. For medium-sized sites that should be "good enough"; as the load increases, you will have to calculate the number of views per article on a regular basis and cache the view counters to speed up the access to this data.