哪种数据库(DBMS)最能处理大型表?
我在 SQL Server 中还有一个非常大的表(2008 R2 开发人员版)那是有一些性能问题。
我想知道另一个 DBMS 是否更适合处理大型表。我主要只考虑以下系统:SQL Server 2008、MySQL 和 PostgreSQL 9.0。
或者,正如上面提到的问题所回避的那样,表的大小和性能主要是索引和缓存的因素吗?
此外,更大程度的标准化会提高性能还是阻碍性能?
编辑:
下面的评论之一声称我含糊其辞。我有超过 2000 万行(20 年的股票数据和 2 年的期权数据),我正在尝试找出如何将性能提高一个数量级。我只关心读取/计算性能;我不关心写入性能。唯一的写入发生在数据刷新期间,即 BulkCopy。
我已经有了一些索引,但希望我做错了什么,因为我需要加快速度。我也需要开始查看我的查询。
提供的评论和答案已经帮助我了解如何开始分析我的数据库。我是一名程序员,而不是一名 DBA(因此Marco 的书籍推荐非常完美)。我没有太多的数据库经验,而且以前从未分析过数据库。我会尝试这些建议,并在必要时进行报告。谢谢你!
I also have a very large table in SQL Server (2008 R2 Developer Edition) that is having some performance problems.
I was wondering if another DBMS would be better for handling large tables. I'm mainly only considering the following systems: SQL Server 2008, MySQL, and PostgreSQL 9.0.
Or, as the referenced question above eludes to, is the table size and performance mainly a factor of indexes and caching?
Also, would greater normalization improve performance, or hinder it?
Edit:
One of the comments below claims I was vague. I have over 20 million rows (20 years of stock data & 2 years of options data), and I am trying to figure out how to improve performance by an order of magnitude. I only care about read/calculation performance; I don't care about write performance. The only writes are during data refreshes, and those are BulkCopy.
I have some indexes already, but hopefully I'm doing something wrong because I need to speed things up a lot. I need to start looking at my queries too.
The comments and answers provided already helped me understand how to start profiling my database. I'm a programmer, not a DBA (therefore Marco's book recommendation is perfect). I don't have that much database experience and I've never profiled a database before. I will try these suggestions and report back if necessary. Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
80M行并不算大。您只需要学习如何设计和查询该大小的数据。其中可能包括规范化、非规范化、集群、索引,但通常情况下,权衡比看起来更深。添加索引实际上会损害性能,即使是读取,例如,如果优化器不够好或决定了错误的统计信息。
我建议您阅读重构 SQL 应用程序,因为它不是从“数据库调优器”而是从开发人员的角度解决问题。观点看法。
该书由 The Art of SQL 的作者撰写,在多种场景下对 Oracle、SQL Server 和 MySQL 进行了比较。它很实用,并附带一些有用的图表。
除非被迫,否则我会远离 MySQL。根据“摇滚”的几个定义,Postgres 9.0 很摇滚,但我仍然会在生产中使用 8.4 几个月。
如果您希望人们帮助您处理此表,请提供尽可能多的详细信息:架构、索引、数据分布、使用模式等。
80M rows is not big. You just need to learn how to design and query data of that size. Which might include normalization, denormalization, clustering, indexing but very often the tradeoffs are deeper that they seem. Adding indexes can actually hurt performance even for reading, for instance, if the optimizer is not good enough or decides upon the wrong statistics.
I suggest you read Refactoring SQL Applications because it approaches the problem not from a "DB tuner" but from a developer's point of view.
The book is by the author of The Art of SQL and compares Oracle, SQL Server and MySQL under many scenarios. It's pragmatic and comes with some useful graphs.
I would stay away from MySQL unless forced to. Postgres 9.0 rocks according to several definitions of "rock" but I would still use 8.4 in production for a few months.
And if you want people to help you with this table, provide as many details as possible: schema, indexes, data distribution, pattern of usage, etc.
切换 DBMS 并不是解决方案。
有多大才算大呢?
它有哪些指标?
如果真的那么大的话可以分区吗?
Switching DBMS is not the solution.
How big is big?
What indexes does it have?
If it really is that big then can you partition it?
您距离 SQL Server 的极限还有很长的路要走。如果您不解决性能问题根源的设计和索引问题,您最终只会将它们移植到不同的平台。
不会有一个灵丹妙药的解决方案可以“让数据库变得更快”,否则很多 DBA 将失业。您只需进行一些性能分析并微调数据库设计和索引策略即可获得符合您要求的性能。
抱歉,确实没有捷径。
如果您提供有关性能和底层表结构/索引方面有问题的查询的更多详细信息,我敢打赌聪明的人将能够提供一些指导。
You are a long way from maxing out SQL Server. If you don't address the design and indexing issues that are the source of your performance problems you will just wind up porting them to a different platform.
There isn't going to be a silver bullet solution that will "Make the db fast" or else a lot of DBA's would be out of a job. You are just going to have to do some performance profiling and fine tune your database design and indexing strategy to get performance in line with your requirements.
Sorry, there really aren't shortcuts.
If you give more detail on the queriesthat are problematic in terms of performance and the underlying table structures/indexing, I'll bet the smarties on SO will be able to provide some guidance.
我认为 simpledb 是选择。考虑到亚马逊将其用于他们的平台。
I think simpledb is the choice. Considering that amazon uses it for their plattform.
刚看到这个。您需要查看 infobright.org。对于数字计算来说,它很棒。它为 mysql 提供了一个数据库引擎,但是是为了分析而不是事务更新而构建的。
您遇到的唯一问题是您的数据集对于 infobright 来说有点小,但应该可以正常工作。
Just saw this. You need to check out infobright.org. For number calculations, its great. It provides a database engine for mysql, but built for analysis, not transactional updates.
The only issue you'll have is your data set is a little small for infobright, but should work fine.
大多数真正的大公司、银行、军队、政府委托大量数据的两种数据库产品是Oracle和DB2。两者都带有适当的高价标签。这两种产品背后都有数十年的密集专业调整,尽管通常只有那些(另外!)为高级顾问买单的人才能享受到这些好处。我有一个朋友,他就是一位 DB2 顾问;他费了很大力气,但通过其他人不会考虑的措施取得了一些惊人的性能提升。
这些都不在您的候选清单中,因此您很可能不会考虑它们。我怀疑任何其他产品也可以处理您的负载,尽管我对 Microsoft 产品有些不信任。所以...将此视为只是为了信息而提供的信息。
The two DB products to which a majority of really big companies, banks, militaries, governments entrust huge amounts of data are Oracle and DB2. Both come with appropriately fat price tags. Both products have decades of intensive professional tuning behind them, though often the benefits are only available to people who foot (additionally!) the bill for high-powered consultants. I have a friend who is such a DB2 consultant; he charges an arm and a leg but achieves some amazing performance gains with measures other people wouldn't consider.
Neither of these is in your short list, so chances are you won't consider them. I suspect that any of the other products could handle your load too, though I have some distrust of Microsoft products. So... consider this as just information for information's sake.