SSD 驱动器的 TPC 或其他数据库基准测试
我对 SSD 驱动器感兴趣已经有一段时间了。 我在数据库方面做了很多工作,并且我一直非常有兴趣寻找使用和不使用 SSD 驱动器时执行的 TPC-H 等基准测试。
从外面看起来好像会有一个,但不幸的是我没能找到一个。 我找到的最接近答案的是这篇博文中的第一条评论。
写这篇文章的人在企业级 SSD 技术方面似乎是一个相当大的反对者,因为他声称混合读取性能不佳/写入工作负载。
还有其他基准,例如 此 和 此 这显示出绝对荒谬的数字。 虽然我不怀疑他们,但我很好奇第一个链接中的评论者所说的是否属实。
无论如何,如果有人能找到使用 SSD 上的数据库进行的基准测试,那就太好了。
I have been interested in SSD drives for quite sometime. I do a lot of work with databases, and I've been quite interested to find benchmarks such as TPC-H performed with and without SSD drives.
On the outside it sounds like there would be one, but unfortunately I have not been able to find one. The closest I've found to an answer was the first comment in this blog post.
http://dcsblog.burtongroup.com/data_center_strategies/2008/11/intels-enterprise-ssd-performance.html
The fellow who wrote it seemed to be a pretty big naysayer when it came to SSD technology in the enterprise, due to a claim of lack of performance with mixed read/write workloads.
There have been other benchmarks such as
this
and
this
that show absolutely ridiculous numbers. While I don't doubt them, I am curious if what said commenter in the first link said was in fact true.
Anyways, if anybody can find benchmarks done with DBs on SSDs that would be excellent.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我已经测试和使用它们有一段时间了,虽然我有自己的意见(非常积极),但我认为 Anandtech.com 的测试文档比我写的任何东西都要好,看看你的想法;
http://www.anandtech.com/show/2739
问候,
菲尔。
I've been testing and using them for a while and whilst I have my own opinions (which are very positive) I think that Anandtech.com's testing document is far better than anything I could have written, see what you think;
http://www.anandtech.com/show/2739
Regards,
Phil.
SSD 的问题在于,只有当模式标准化为 3NF 或 5NF 时它们才真正有意义,从而删除“所有”冗余数据。 将“非规范化速度”的混乱转移到 SSD 不会有成效,大量冗余数据将使 SSD 成本过高。
对于某些现有应用程序执行此操作意味着重新定义视图的现有表(引用),将标准化表封装在幕后。 引擎的 CPU 合成行会产生时间损失。 原始模式越非规范化,重构和迁移到 SSD 的好处就越大。 即使在 SSD 上,这些非规范化模式的运行速度也可能会变慢,因为必须检索和写入大量数据。
没有指示将日志放在SSD上; 这是一个以顺序写入为主(正常情况下只写)的操作,SSD(闪存类型;一家名为 Texas Memory Systems 的公司长期以来一直在构建基于 RAM 的子系统)的物理特性使得这一点没有被指出。 传统的防锈驱动器,适当缓冲,就可以了。
请注意 anandtech 文章; 英特尔驱动器是唯一可以正常工作的驱动器。 到 2009 年底,这种情况可能会发生变化,但到目前为止,只有英特尔驱动器才符合严格使用的条件。
The issue with SSD is that they make real sense only when the schema is normalized to 3NF or 5NF, thus removing "all" redundant data. Moving a "denormalized for speed" mess to SSD will not be fruitful, the mass of redundant data will make SSD too cost prohibitive.
Doing that for some existing application means redefining the existing table (references) to views, encapsulating the normalized tables behind the curtain. There is a time penalty on the engine's cpu to synthesize rows. The more denormalized the original schema, the greater the benefit to refactor and move to SSD. Even on SSD, these denormalized schemas will run slower, likely, due to the mass of data which must be retrieved and written.
Putting logs on SSD is not indicated; this is a sequential write-mostly (write-only under normal circumstances) operation, physics of SSD (flash type; a company named Texas Memory Systems has been building RAM based sub-systems for a long time) makes this non-indicated. Conventional rust drives, duly buffered, will do fine.
Note the anandtech articles; the Intel drive was the only one which worked right. That will likely change by the end of 2009, but as of now only the Intel drives qualify for serious use.
我已经在 SSD 上运行相当大的 SQL2008 数据库 9 个月了。 (600GB,超过 10 亿行,每秒 500 个事务)。 我想说的是,我测试过的大多数 SSD 驱动器对于这种使用来说都太慢了。 但如果您选择高端英特尔芯片,并仔细选择您的 RAID 配置,结果将会非常棒。 我们谈论的是每秒 20,000 多次随机读/写。 根据我的经验,如果坚持使用 RAID1,您将获得最佳结果。
我已经等不及英特尔推出 320GB SSD 了! 它们预计将于 2009 年 9 月上市......
I've been running a fairly large SQL2008 database on SSDs for 9 months now. (600GB, over 1 billion rows, 500 transactions per second). I would say that most SSD drives that I tested are too slow for this kind of use. But if you go with the upper end Intels, and carefully pick your RAID configuration, the results will be awesome. We're talking 20,000+ random read/writes per second. In my experience, you get the best results if you stick with RAID1.
I can't wait for Intel to ship the 320GB SSDs! They are expected to hit the market in September 2009...
使用 SSD 的正式 TPC 基准测试可能需要一段时间才能出现,因为 TPC 基准测试有两个部分 - 速度(每单位时间的事务)和每单位成本(每单位时间的事务)。 由于SSD的速度很高,您必须将DB的大小扩展得更大,从而使用更多的SSD,从而成本更高。 因此,即使您可能获得极快的速度,但对于全面扩展(可审核、可发布)的 TPC 基准测试来说,成本仍然过高。 这种情况将持续一段时间,比如几年后,而 SSD 比相应数量的旋转磁盘更昂贵。
The formal TPC benchmarks will probably take a while to appear using SSD because there are two parts to the TPC benchmark - the speed (transactions per unit time) and the cost per (transaction per unit time). With the high speed of SSD, you have to scale the size of the DB even larger, thus using more SSD, and thus costing more. So, even though you might get superb speed, the cost is still prohibitive for a fully-scaled (auditable, publishable) TPC benchmark. This will remain true for a while yet, as in a few years, while SSD is more expensive than the corresponding quantity of spinning disk.
评论:
“...非常有兴趣寻找使用和不使用 SSD 驱动器执行的 TPC-H 等基准测试。”
(仅供参考并全面披露,我的笔名是“J Scouter”,上面提到并链接了“企业中 SSD 技术的相当大的反对者”。)
所以......这是出现的第一个线索。
戴尔和 Fusion-IO 发布了第一个使用闪存设备进行存储的经过审核的基准测试。
基准测试是TPC-H,它是一个“决策支持”基准测试。 这很重要,因为 TPC-H 需要几乎完全“只读”的工作负载模式——SSD 的完美环境,因为它完全避免了写入性能问题。
在闪存 SSD 炒作者为我们描绘的场景中,该应用程序代表了闪存 SSD 数据库应用程序的软投球、温和的高球以及轻松的“本垒打”。
结果? 第一个针对基于闪存 SSD 的数据库应用程序的审核基准测试以及只读基准测试结果(此处为鼓点)....在测试的同类 (100GB) 系统中获得第五名。
该闪存 SSD 系统产生的每小时查询量大约是 Sun 在 2007 年发布的基于磁盘的系统结果的 30%。
当然,这种基于闪存的系统将在性价比方面获胜,对吧?
Dell/Fusion-IO 系统以每小时每个查询 1.46 美元的价格位居第三。 每小时每个查询的成本是最佳成本/性能基于磁盘的系统的两倍以上。
再次强调,这是针对 TPC-H 的,它是一个几乎“只读”的应用程序。
这与 MS Cambridge 研究团队一年多前的发现完全一致——从经济或能源的角度来看,Flash 不存在具有投资回报率的企业工作负载。
迫不及待地想看到 TPC-C、TPC-E ,或 SPC-1,但根据上面链接的研究论文,SSD 需要变得更便宜,才能在企业应用程序中发挥作用。
Commenting on:
"...quite interested to find benchmarks such as TPC-H performed with and without SSD drives."
(FYI and full disclosure, I am pseudonymously "J Scouter", the "pretty big naysayer when it came to SSD technology in the enterprise" referred to and linked above.)
So....here's the first clue to emerge.
Dell and Fusion-IO have published the first EVER audited benchmark using a Flash-memory device for storage.
The benchmark is the TPC-H, which is a "decision support" benchmark. This is important because TPC-H entails an almost exclusively "read-only" workload pattern -- perfect context for SSD as it completely avoids the write performance problem.
In the scenarios painted for us by the Flash SSD hypesters, this application represents a soft-pitch, a gentle lob right over the plate and an easy "home run" for a Flash-SSD database application.
The results? The very first audited benchmark for a flash SSD based database application, and a READ ONLY one at that resulted in (drum roll here)....a fifth place finish among comparable (100GB) systems tested.
This Flash SSD system produced about 30% as many Queries-per-hour as a disk-based system result published by Sun...in 2007.
Surely though it will be in price/performance that this Flash-based system will win, right?
At $1.46 per Query-per-hour, the Dell/Fusion-IO system finishes in third place. More than twice the cost-per-query-per-hour of the best cost/performance disk-based system.
And again, remember this is for TPC-H, a virtually "read-only" application.
This is pretty much exactly in line with what the MS Cambridge Research team discovered over a year ago -- that there are no enterprise workloads where Flash makes ROI sense from economic or energy standpoints
Can't wait to see TPC-C, TPC-E, or SPC-1, but according the the research paper that was linked above, SSDs will need to become orders-of-magnitude cheaper for them to ever make sense in enterprise apps.