需要建议:大型数据库的 SQL Server DB 架构
大家好!
我的客户目前拥有一个 SQL Server 数据库,每天执行 3-4 百万次插入、大约同样多的更新,甚至更多的读取。当前数据库的布局很奇怪恕我直言:传入的数据进入“当前”表,然后每晚记录被移动到相应的每月表(即 MarchData、AprilData、MayData 等),它们是当前表的精确副本(模式方面的 i意思是)。读取是从对所有每月表和当前表进行 UNION 的视图中完成的,插入和更新仅对当前表进行。有人向我解释,将数据分成 13 个表的原因是所有这些表都使用单独的数据文件,并且这些数据文件被写入 13 个物理硬盘驱动器。因此每个表都有自己的硬盘驱动器,据说可以加快视图性能。我注意到的是,每晚记录移动到每月表(晚上 8 小时内每 2 分钟执行一次)与完整备份和数据库开始爬行、网站超时等同时发生。
我想知道这种方法是否有效真的是最好的方法吗?或者我们可以考虑采用不同的方法吗?请注意,该数据库大约有 300-400 GB,并且每天以 1.5-2 GB 的速度增长。我们经常将超过 12 个月的记录移至单独的数据库(存档)。
任何见解都受到高度赞赏。
HI all!
My client currently has a SQL Server database that performs 3-4 Million Inserts, about as many updates and even more reads a day, every day. Current DB is laid out weirdly IMHO: The incoming data goes to "Current" table, then nightly records are moved to corresponding monthly tables (i.e. MarchData, AprilData, MayData etc.), that are exact copies of Current table (schema-wise i mean). Reads are done from view that UNIONs all monthly tables and Current table, Inserts and Updates are done only to Current table. It was explained to me that the separation of data into 13 tables was motivated by the fact that all those tables use separate data files and those data files are written to 13 physical hard drives. So each table gets its own hard drive, supposedly speeding up the view performance. What i'm noticing is that nightly record move to monthly tables (which is done every 2 minutes for the period of night, 8 hours) coincides with full backup and DB starts crawling, web site times out etc.
I was wondering is this approach really the best approach out there? Or can we consider a different approach? Please mind, that the database is about 300-400 GB and growing by 1.5-2 GB a day. Every so often we move records that are more than 12 months old to a separate database (archive).
Any insight is highly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您使用的是 MS SQL Server,请考虑分区表和索引。
简而言之:您可以按某个值(即按年和月)对行进行分组。每个组都可以作为具有自己索引的单独表进行访问。因此,您可以列出、汇总和编辑 2011 年 2 月的销售额,而无需访问所有行。分区表使数据库变得复杂,但在表极长的情况下,它可能会带来显着更好的性能。它还支持“文件组”将值存储在不同的磁盘中。
这个微软制作的解决方案看起来与您的非常相似,除了一件重要的事情:它不会在一夜之间移动记录。
If you are using MS SQL Server, consider Partitioned Tables and Indexes.
In short: you can group your rows by some value, i.e. by year and month. Each group could be accessible as separate table with own index. So you can list, summarize and edit February 2011 sales without accessing all rows. Partitioned Tables complicate the database, but in case of extremely long tables it could lead to significantly better performance. It also supports "filegroups" to store values in different disks.
This MS-made solution seems very similar to yours, except one important thing: it doesn't move records over night.
对此有一个说法:IDIOTS AT WORK。
表不存储在磁盘上,而是存储在可以跨越多个数据文件的文件空间中。请注意这一点...因此您可以在 13 个磁盘上拥有一个包含 12 个数据文件的文件空间,并且一个表将分布在所有 13 个表上。无需玩愚蠢的游戏来分配负载,只需阅读文档就可以实现。
即便如此,我还是严重怀疑 13 盘的速度是否快。真的。我私下运行一个较小的数据库(仅800GB),仅用于数据就有6张光盘,而我当前的工作分配是三位数的光盘(即100+)。请不要将 13 盘命名为大型数据库。
无论如何,应该需要分发数据,而不是 UNION,而是分区表(获得标准 SQL Server,尽管是企业版功能)才是正确的选择。
找一个像样的服务器。
哦,硬件。获取 SuperMicro 数据库盒之一,2 至 4 个机架单位高,SAS 背板,24 至 72 个光盘插槽。是的,一台电脑。
废弃那些显然不应该使用数据库的人提出的每月 blabla 表废话。全部集中在一张桌子上。使用文件空间和多个数据文件来处理所有表到各个磁盘的负载分配。除非......
...您实际上意识到像这样运行光盘是严重的疏忽。 RAID 5 或 RAID 6 或 RAID 10 是合适的,否则当磁盘发生故障时,您的服务器可能会停机,并且重新启动 600GB 数据库需要时间。我为我的数据光盘运行 RAID 10,但随后私下拥有包含大约 10 亿行的表(在工作中我们每天都会添加这一行)。鉴于数据库规模较小,几个 SSD 也会有所帮助......他们的 IOPS 预算意味着您可以使用 2-3 个磁盘并获得更高的速度。如果这是不可能的,我打赌这些光盘是 7200 RPM 的慢速 3.5" 光盘...升级到企业级光盘会有所帮助。我个人使用 300gb Velociraptors 作为数据库,但需要使用 15k SAS 光盘; )
不管怎样,这听起来确实很糟糕。太糟糕了,我要么很高兴我的受训者想出了这么聪明的东西(因为这肯定超出了受训者的能力范围),要么我的开发人员在我发现这一点后就会停止为我工作(基于严重的无能,感觉可以自由地在法庭上提出质疑)
重新组织它。还要小心任何批处理 - 这些批处理需要时间错开,这样它们就不会与备份重叠。一个简单的低速光盘只能提供这么多的 IO。
THere is one statement for that: IDIOTS AT WORK.
Tables are not stored on discs, but in file spaces which can span multiple data files. Note this... so you can have one file space that has 12 data files on 13discs and a table would be DISTRIBUTED OVER ALL 13 TABLES. No need to play stupid silly games to distribute the load, it is already possible just by reading the documentation.
Even then, I seriously doubt 13 discs are fast. Really. I run a smaller database privately (merely 800gb) that has 6 discs for the data alone, and my current work assignment is into three digits of discs (that is 100+). Please, do not name 13 discs a large database.
Anyhow, SHOULD the need arive to distribute data, not a UNION but partitioned tables (atgain a standard sql server, albeit enterprise edition feature) is the way to go.
Get a decent server.
Oh, hardware. Get one of the SuperMicro boxes for databases 2 to 4 rack units high, SAS backplane, 24 to 72 slots for discs. Yes, one one computer.
Scrap that monthly blabla table crap that someone came up with who obviously shoul not work with databases. All in one table. Use filespaces and multiple data files to handle load distribution for all tables into the various discs. Unless...
...you actually realize that running discs like that is gross neglect. A RAID 5 or RAID 6 or RAID 10 is in order, otherwise your server is possibly down when a disc fails which will happen and resotring a 600gb database takes time. I run RAID 10 for my data discs, but then privately have tables with about a billion rows (and in work we add about that a day). Given the SMALL size of the database, a couple of SSD would also help.... their IOPS budget would mean you could go to possibly 2-3 discs and get a lot more speed out. If that is not possible, my bet is that those discs are slow 3.5" discs with 7200 RPM... an upgade to enterprise level discs would help. I personaly use 300gb Velociraptors for databases, but there are 15k SAS discs to be taken ;)
Anyho, this sounds really badly set up. So bad I would either be happy my trainee came up with something that smart (as it woul definitely be over the head of a trainee), or my developer would stop working for me the moment I Find that out (based on gross incompetence, feel free to challenge in court)
Reorganize it. Also be carefull with any batch processing - those NEED to be time staggered so they do not overlap wioth backups. There is only so much IO a mere simple low speed disc can deliver.