通过将大型键值存储从 MySQL 迁移到 NoSQL 数据库,我能否获得显着的性能提升?
我正在开发一个包含大量科学数据集的数据库。典型的使用场景是每天会有5GB左右的新数据写入数据库;每天也会删除 5GB。数据库总大小约为 50GB。我运行的服务器将无法将整个数据集存储在内存中。
我已经构建了数据库,使得主数据表只是一个由唯一 ID 和值组成的键/值存储。
查询通常针对大约 100 个连续值, 例如。 SELECT Value WHERE ID BETWEEN 7000000 AND 7000100;
我目前正在使用 MySQL / MyISAM,这些查询大约需要 0.1 - 0.3 秒,但最近我开始意识到 MySQL 可能是对于基本上大型键/值存储来说,这不是最佳解决方案。
在我开始做大量安装新软件和重写整个数据库的工作之前,我想大致了解一下使用 NoSQL DB(例如 Tokyo Tyrant、Cassandra、MongoDB)而不是使用 NoSQL DB 时是否会看到性能的显着提升。 MySQL 用于这些类型的检索。
谢谢
I'm developing a database that holds large scientific datasets. Typical usage scenario is that on the order of 5GB of new data will be written to the database every day; 5GB will also be deleted each day. The total database size will be around 50GB. The server I'm running on will not be able to store the entire dataset in memory.
I've structured the database such that the main data table is just a key/value store consisting of a unique ID and a Value.
Queries are typically for around 100 consecutive values,
eg. SELECT Value WHERE ID BETWEEN 7000000 AND 7000100;
I'm currently using MySQL / MyISAM, and these queries take on the order of 0.1 - 0.3 seconds, but recently I've come to realize that MySQL is probably not the optimal solution for what is basically a large key/value store.
Before I start doing lots of work installing the new software and rewriting the whole database I wanted to get a rough idea of whether I am likely to see a significant performance boost when using a NoSQL DB (e.g. Tokyo Tyrant, Cassandra, MongoDB) instead of MySQL for these types of retrievals.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
另请考虑 OrientDB。它使用RB+Tree算法的索引。在我的测试中,在笔记本电脑上读取 100GB 数据库的 100 个项目花费了 0.001-0.015 秒,但这取决于键/值在索引内的分布方式。
使用它进行自己的测试应该需要不到 1 小时。
一个坏消息是 OrientDB 尚不支持集群配置(计划于 2010 年 9 月发布)。
Please consider also OrientDB. It uses indexes with RB+Tree algorithm. In my tests with 100GB of database reads of 100 items took 0.001-0.015 seconds on my laptop, but it depends how the key/value are distributed inside the index.
To make your own test with it should take less than 1 hour.
One bad news is that OrientDB not supports a clustered configuration yet (planned for September 2010).
我在生产中使用 MongoDB 进行写入密集型操作,其中我的写入和读取操作速度都比您提到的要好,数据库的大小约为 90GB,单个实例 (amazon m1.xlarge) 的速度为 100QPS 我可以告诉你,在150M条目的数据库上,典型的key->value查询大约需要1-15ms,在重负载下查询时间达到30-50ms。
无论如何,200 毫秒对于键/值存储来说太多了。
如果您只使用单个商品服务器,我会建议 mongoDB,因为它非常高效且易于学习
如果您正在寻找分布式解决方案,您可以尝试任何 Dynamo 克隆:
Cassandra (Facebook) 或 Project Volemort (LinkedIn) 是最受欢迎的。
请记住,寻求强一致性会大大减慢这些系统的速度。
I use MongoDB in production for a write intensive operation where I do well over the rates you are referring to for both WRITE and READ operations, the size of the database is around 90GB and a single instance (amazon m1.xlarge) does 100QPS I can tell you that a typical key->value query takes about 1-15ms on a database with 150M entries, with query times reaching the 30-50ms time under heavy load.
at any rate 200ms is way too much for a key/value store.
If you only use a single commodity server I would suggest mongoDB as it quite efficient and easy to learn
if you are looking for a distributed solution you can try any Dynamo clone:
Cassandra (Facebook) or Project Volemort (LinkedIn) being the most popular.
keep in mind that looking for strong consistency slows down these systems quite a bit.
我希望 Cassandra 在数据集无法放入内存的情况下比基于 B 树的系统(如 TC、MySQL 或 MongoDB)做得更好。当然,Cassandra 的设计还使得如果您需要更高的性能,则添加更多机器来支持您的工作负载是微不足道的。
I would expect Cassandra to do better where the dataset does not fit in memory than a b-tree based system like TC, MySQL, or MongoDB. Of course, Cassandra is also designed so that if you need more performance, it's trivial to add more machines to support your workload.