对使用多少线程将数据加载到 MS SQL Server 2005 数据库中的最佳猜测
我有大约 4 天的时间来加载 400 万行数据,其中存储过程似乎每行大约需要 500 毫秒,因此我建议我们暂时重新调整两台计算机的用途,一台作为 SQL Server,另一台用于提供数据。每台机器都是一台 4 个 cpu、2 个核心/cpu 的计算机,具有大量内存。
关于如何加载数据有两种相互竞争的想法。
- 在导入计算机上保持使用虚拟机,这限制了我只能使用 4 个核心。
- 放弃使用 VM 进行加载,并使用所有 8 个核心进行加载。
数据库将位于实际计算机上,而不是虚拟机上。
我将尝试展示使用虚拟机将极大地影响性能,但这只是一种直觉。我需要证明这一点,所以我建议我们使用较小的数据集进行测试加载,因此我们可能只加载 6G,而不是加载大约 12G 的数据,并获取数字。
但是,由于我有一次尝试,所以我想合理猜测这两种环境中可以使用多少线程。
我的猜测是 4 个线程/核心,因为数据库事务涉及大量线程等待时间,但我不知道虚拟机上的线程是否太多。我不确定如何确定建议的合理线程数。
限制因素可能不是具有加载数据的程序的计算机,可能是数据库计算机实际上才是限制因素,32 个并发数据库连接对于 8 核机器来说可能太多了。
那么,我对每个核心 4 个线程的猜测在这两种环境中是否合理?
计算机上的操作系统是 Windows Server 2008 R2,虚拟机将使用 Hyper-V。
更新:
以防万一,我正在使用 .NET 4,使用并行框架,因此我通过在配置文件中更改它来调整并发线程的数量,因此我大量使用 PLINQ。
更新2:
今天我有一个想法来加快速度,我们将使用一个4核VM进行基准测试,但SQL Server将使用24G虚拟驱动器(RAM驱动器),因为计算机有48G RAM,数据库将完全在内存中。这样我应该能够像供给程序一样快地用多个线程供给它。如果这还不够快,那么我们将使用所有 12 个核心来为其提供支持,但他们预计不需要它。因此,我应该能够在 1 天内加载所有数据,而不是一个多月,但是,一旦我获得一些基准,我会让每个人都知道。我可能会考虑使用较小的桌面来为其提供数据,以查看使用虚拟机是否会受到影响,但此机器上只有一个虚拟机,因此 8 个核心将未被使用。顺便说一句,Hyper-V 机器上的操作系统是 Windows Core。
I have about 4 days to load 4 million rows of data, where the stored procedures appear to take about 500ms/row, so I am suggesting we temporarily repurpose two computers, one as the SQL Server and one to feed it. Each machine is a 4 cpu, 2 core/cpu computer with tons of memory.
There are two competing ideas on how to load the data.
- on the import computer maintain using a VM, which limits me to 4 cores of use.
- ditch the VM for the load and use all 8 cores for loading.
The database will be on the actual computer, no VM.
I will get one shot at showing that using the VM will considerably impact performance, but that is just a gut feeling. I need to prove it, so I am going to suggest we do a test load with a smaller dataset, so rather than loading about 12G data we will do perhaps only 6G, and get numbers.
But, since I have one shot at this, I want to get a reasonable guess at how many threads can be used in both environments.
My guess is 4 threads/core, since database transactions involve a great deal of waiting time for threads, but I don't know if that would be too many threads on the VM. I am not certain how to determine what is a reasonable number of threads to suggest.
The limiting factor may not be the computer that has the program that loads the data, it may be the database computer will actually be the limiting factor, and 32 concurrent database connections may be too much for an 8 core machine.
So, is my guess of 4 threads per core reasonable in both environments?
The OS on the computers is Windows Server 2008 R2 and the VM would be using Hyper-V.
UPDATE:
In case it matters, I am using .NET 4, using the parallel framework, so I adjust the number of concurrent threads by changing it in my configuration file, so I use PLINQ heavily.
UPDATE 2:
I had an idea today to speed things up, where we will use one 4 core VM to benchmark but the SQL Server will use a 24G virtual drive (ram drive), since the computer has 48G RAM, and the database will be completely in memory. This way I should be able to feed it with several threads as fast as the feeder program can. If that isn't fast enough then we will use all 12 cores to feed it, but they expect it won't be needed. So, I should be able to hopefully load all the data in 1 day instead of a bit over a month, but, once I get some benchmarks I will let everyone know. I may see about using a smaller desktop to feed it to see if there is a penalty by using a VM, but there will only be one VM on this box, so 8 cores will be unused. The OS on the Hyper-V box is Windows Core, btw.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
涉及的变量太多,我想说你的猜测和任何猜测一样好。你已经合理地考虑过了,测试将是你最好的选择,看看你是否需要为大的运行而调整。
There are so many variables involved that I would say your guess is as good as any. You have thought it out reasonably and testing will be your best bet to see if you need to adjust for the big run.
做一个小测试,用每个核心 1、2、3、4、5、6 个线程进行测试。把它画出来。应该显示得很清楚...
Make a small test, and test with 1, 2, 3, 4, 5, 6 threads per core. Graph it out. Should show pretty clearly...