Web 应用程序接收数百万个请求并导致在 SQL Server 2008 中每 30 秒生成数百万行插入
我目前正在解决这样的情况:我们的 Web 应用程序每 30 秒至少收到一百万个请求。因此,这些请求将导致在 5 个表之间生成 3-5 百万行插入。这是一个相当重的负载。目前我们正在使用多线程来处理这种情况(速度更快一些,但无法获得更好的 CPU 吞吐量)。然而,未来负载肯定会增加,我们也必须考虑到这一点。 6 个月后,我们正在考虑将当前接收的负载大小增加一倍,我目前正在寻找一种可能的新解决方案,该解决方案是可扩展的,并且应该足够容易以适应此负载的任何进一步增加。 目前,通过多线程,我们使整个调试场景变得非常复杂,有时我们在跟踪问题上遇到问题。
仅供参考,我们已经在利用上一篇文章中提到的 SQL Builk Insert/Copy
Sql server 2008 - 用于插入大量数据的性能调整功能
但是,我正在寻找一种功能更强大的解决方案(我认为应该有一个)来解决这种情况。
注意:我不是在寻找任何代码片段或代码示例。我只是在寻找我可能使用的概念的大图,我相信我可以进一步将其发展为一个优雅的解决方案:)
此外,该解决方案应该更好地利用线程和进程。而且我不希望我的线程/进程因为其他资源而等待执行某些操作。
任何建议将不胜感激。
更新:并不是每个请求都会导致插入...但是大多数请求都会导致一些sql操作。应用程序执行不同类型的事务,这些将导致大量批量 SQL 操作。我更关心插入和更新。 这些操作不需要是实时的,可能会有一点延迟……但是实时处理它们会很有帮助。
I am currently addressing a situation where our web application receives at least a Million requests per 30 seconds. So these requests will lead to generating 3-5 Million row inserts between 5 tables. This is pretty heavy load to handle. Currently we are using multi threading to handle this situation (which is a bit faster but unable to get a better CPU throughput). However the load will definitely increase in future and we will have to account for that too. After 6 months from now we are looking at double the load size we are currently receiving and I am currently looking at a possible new solution that is scalable and should be easy enough to accommodate any further increase to this load.
Currently with multi threading we are making the whole debugging scenario quite complicated and sometimes we are having problem with tracing issues.
FYI we are already utilizing the SQL Builk Insert/Copy that is mentioned in this previous post
Sql server 2008 - performance tuning features for insert large amount of data
However I am looking for a more capable solution (which I think there should be one) that will address this situation.
Note: I am not looking for any code snippets or code examples. I am just looking for a big picture of a concept that I could possibly use and I am sure that I can take that further to an elegant solution :)
Also the solution should have a better utilization of the threads and processes. And I do not want my threads/processes to even wait to execute something because of some other resource.
Any suggestions will be deeply appreciated.
Update: Not every request will lead to an insert...however most of them will lead to some sql operation. The appliciation performs different types of transactions and these will lead to a lot of bulk sql operations. I am more concerned towards inserts and updates.
and these operations need not be real time there can be a bit lag...however processing them real time will be much helpful.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
由于您不需要实时插入/更新,您可能会考虑拥有两个数据库;一个用于读取,一个用于写入。与拥有 OLTP 数据库和 OLAP 数据库类似:
读取数据库:
插入/更新数据库:
您基本上会将所有插入/更新操作定向到插入/更新数据库。然后,您将创建一个发布流程,以特定的时间间隔将数据移动到读取数据库。当我过去看到这种情况时,数据通常会在夜间转移,而很少有人会使用该网站。有许多用于移动数据的选项,但我会首先查看 SSIS。
这将取决于您执行以下操作的能力:
Since you do not need to have your inserts/updates real time you might consider having two databases; one for reads and one for writes. Similar to having a OLTP db and an OLAP db:
Read Database:
Insert/Update database:
You would basically direct all insert/update actions to the Insert/Update db. You would then create a publication process that would move data over to the read database at certain time intervals. When I have seen this in the past the data is usually moved over on a nightly bases when few people will be using the site. There are a number of options for moving the data over, but I would start by looking at SSIS.
This will depend on your ability to do a few things:
我认为你的问题更多地是为了获得更好的 CPU 吞吐量,这将带来更好的性能。因此,我可能会考虑类似异步处理的东西,其中线程永远不会闲置,您可能必须以链表或任何其他适合的数据结构的形式维护队列你的编程模型。
其工作方式是,您的线程将尝试立即执行给定的作业,如果有任何事情阻止它们执行此操作,那么它们会将该作业推送到队列中,并且将根据其存储的方式处理这些推送的项目。容器/队列中的项目。
在您的情况下,由于您已经在使用批量 sql 操作,因此您应该很好地采用此策略。
让我知道这是否对您有帮助。
I think your problem looks more towards getting a better CPU throughput which will lead to a better performance. So I would probably look at something like an Asynchronous Processing where in a thread will never sit idle and you will probably have to maintain a queue in the form of a linked list or any other data structure that will suit your programming model.
The way this would work is your threads will try to perform a given job immediately and if there is anything that would stop them from doing it then they will push that job into the queue and these pushed items will be processed based on how it stores the items in the container/queue.
In your case since you are already using bulk sql operations you should be good to go with this strategy.
lemme know if this helps you.
您能否对数据库进行分区,以便将插入分散到各处?插入后如何使用这些数据?数据是否存在按客户、地理位置或其他因素自然划分的情况?
由于您正在使用 SQL Server,我建议您获取几本有关 SQL Server 高可用性和高性能的书籍。内部原理书也可能有帮助。亚马逊有很多这样的东西。这是一个复杂的主题,需要太多的深度才能在公告板上给出简单的答案。但基本上高性能设计有几个关键,包括硬件选择、分区、正确的索引、正确的查询等。要有效地做到这一点,您必须深入了解 SQL Server 在幕后的作用以及更改如何产生巨大的影响在性能上。
Can you partition the database so that the inserts are spread around? How is this data used after insert? Is there a natural partion to the data by client or geography or some other factor?
Since you are using SQL server, I would suggest you get several of the books on high availability and high performance for SQL Server. The internals book muight help as well. Amazon has a bunch of these. This is a complex subject and requires too much depth for a simple answer on a bulletin board. But basically there are several keys to high performance design including hardware choices, partitioning, correct indexing, correct queries, etc. To do this effectively, you have to understand in depth what SQL Server does under the hood and how changes can make a big difference in performance.