Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
But main feature required besides those is running big analyses on the database in maximum speed
So now all you need is 90TB+ of RAM and you're set. "Maximum" speed is a very relative concept.
I have got about 90TB of text in a ~200 tables. This is structured related data. Any true relational distributed and per formant database would do the job.
What is a "true relational distributed database"?
Let's flip this around. Let's say that you had 90 servers and they each held 1TB of data. What's your plan to perform joins amongst your 200 tables and 90 servers?
In general, cross-server joins, don't scale very well. Trying to run joins across 90 servers is probably going to scale even less. Partitioning 200 tables is a lot of work.
which other databases to keep track of generally in this context and which to drop off the list
OK, so there are lots of follow-up questions here:
What are you running right now?
What are your pain points?
Are you really planning to just drop in a new system?
Is there a smaller sub-system that can be tested on first?
If you have 200 tables, how many different queries are you running? Thousands?
How do you plan to test that queries are behaving correctly?
Sounds like a good fit for Cassandra + Hadoop. This is possible with a little effort today; DataStax (where I work) is introducing Brisk (also open source) to make it easier: http://www.datastax.com/products/brisk
发布评论
评论(2)
所以现在您只需要 90TB+ 的 RAM 就可以了。 “最大”速度是一个非常相对的概念。
什么是“真正的关系分布式数据库”?
让我们翻转一下。假设您有 90 台服务器,每台服务器保存 1TB 的数据。您计划如何在 200 个表和 90 台服务器之间执行联接?
一般来说,跨服务器连接的扩展性不太好。尝试在 90 台服务器上运行联接的规模可能会更小。对 200 个表进行分区是一项艰巨的工作。
好的,所以这里有很多后续问题:
So now all you need is 90TB+ of RAM and you're set. "Maximum" speed is a very relative concept.
What is a "true relational distributed database"?
Let's flip this around. Let's say that you had 90 servers and they each held 1TB of data. What's your plan to perform joins amongst your 200 tables and 90 servers?
In general, cross-server joins, don't scale very well. Trying to run joins across 90 servers is probably going to scale even less. Partitioning 200 tables is a lot of work.
OK, so there are lots of follow-up questions here:
听起来很适合 Cassandra + Hadoop。今天只需付出一点努力就可以做到这一点; DataStax(我工作的地方)正在引入 Brisk(也是开源的)以使其变得更容易:http://www.datastax.com datastax.com/products/brisk
Sounds like a good fit for Cassandra + Hadoop. This is possible with a little effort today; DataStax (where I work) is introducing Brisk (also open source) to make it easier: http://www.datastax.com/products/brisk