需要一个可扩展、分布式、能够极快地读取数据并与 .NET 配合使用的存储解决方案
我目前有一个 RDBMS 数据解决方案。服务器上的负载将增长 10 倍,而且我不相信它会扩展。
我相信我需要的是一个能够提供容错、可扩展并且能够极快地检索数据的数据存储。
The Stats
Records: 200 million
Total Data Size (not including indexes): 381 GB
New records per day: 200,000
Queries per Sec: 5,000
Query Result: 1 - 2000 records
Requirements
Very fast reads
Scalable
Fault tolerant
Able to execute complex queries (conditions across many columns)
Range Queries
Distributed
Partition – Is this required for 381 GB of data?
Able to Reload from file
In-Memory (not sure)
Not Required
ACID - Transactions
数据存储的主要目的是快速检索数据。将访问此数据的查询将具有跨许多不同列(30 列,可能更多)的条件。我希望这是足够的信息。
我读过许多不同类型的数据存储,包括 NoSQL、内存中、分布式哈希、键值、信息检索库、文档存储、结构化存储、分布式数据库、表格等。还有超过 2 打产品实现这些数据库类型。有很多东西需要消化并找出哪些可以提供最佳解决方案。
该解决方案最好在 Windows 上运行并与 Microsoft .NET 兼容。
根据上述信息,有人有任何建议吗?为什么?
谢谢
I currently have a data solution in RDBMS. The load on the server will grow by 10x, and I do not believe it will scale.
I believe what I need is a data store that can provide fault tolerant, scalable and that can retrieve data extremely fast.
The Stats
Records: 200 million
Total Data Size (not including indexes): 381 GB
New records per day: 200,000
Queries per Sec: 5,000
Query Result: 1 - 2000 records
Requirements
Very fast reads
Scalable
Fault tolerant
Able to execute complex queries (conditions across many columns)
Range Queries
Distributed
Partition – Is this required for 381 GB of data?
Able to Reload from file
In-Memory (not sure)
Not Required
ACID - Transactions
The primary purpose of the data store is retrieve data very fast. The queries that will access this data will have conditions across many different columns (30 columns and probably many more). I hope this is enough info.
I have read about many different types of data stores that include NoSQL, In-Memory, Distributed Hashed, Key-Value, Information Retrieval Library, Document Store, Structured Storage, Distributed Database, Tabular and others. And then there are over 2 dozen products that implement these database types. This is a lot of stuff to digest and figure out which would provide the best solution.
It would be preferred that the solution run on Windows and is compatible with Microsoft .NET.
Base on the information above, does any one have any suggestions and why?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
那么,你的问题是什么?我在这里并没有真正看到任何不平凡的东西。
快速且可扩展:获取数据库(抱歉,复杂的查询,列=数据库)并获取一些不错的 SAN - HP EVA 非常棒。我见过,在数据库中,使用 190 个 SAS 磁盘每秒提供 800mb 的随机 IO 读取。对你来说够快吗?抱歉,但这就是可扩展性。
400GB 数据库大小无论如何都不会改变。
最后,请专业人士来调整您的数据库服务器。就这么简单。正确使用 SQL Server 比“好吧,我只知道选择应该是什么样子”(没有真正知道)要复杂得多。
So, what is your problem? I do not really see anything even nontrivial here.
Fast and scaling: Grab a database (sorry, complex queries, columns = database) and get some NICE SAN - a HP EVA is great. I have seen it, in a database, deliver 800mb of random IO reads per seconds..... using 190 SAS discs. Fast enough for you? Sorry, but THIS is scalability.
400gb database size are not remarakble by any means.
Finally, get a pro to tune your database server(s). That simple. SQL Server is a lot more complicated to properly use than "ok, I just know how a select should look" (without really knmowing).