With that amount of data I think you should try looking into NoSQL (MongoDB, Cassandra, HBase, etc.). With MySQL you have to scale your servers a lot. We tried doing ~1200 inserts/sec and MySQL failed (or hardware failed). The solution was using XCache (memcached failed at that time also). Try looking into NoSQL, you'll like it.
4B 行 x 30 x 4 字节约为每天 1/2 TB。我认为您无法将其保存在一台机器中,并且您的 SAN 可能也会遇到麻烦。我会关注 Cassandra,因为它是为高写入量而构建的。
4B rows x 30 x 4 bytes is about 1/2 terabyte a day. I don't think you're going to be able to keep this in a single machine and your SAN may have trouble too. I would look at Cassandra as it's built for high write volumes.
If I were you, I'd separate the solution into data capture and data analysis servers; this is a fairly common pattern. Queries and reporting run against your data warehouse (where you may be able to use a different schema to your data collection system). You load data into your data warehouse using an ETL (extract, transform, load) process, which in your case could be very simple.
As to how you would support 70K writes per second - I'd say this is well beyond the capabilities of most RDBMS servers unless you have a dedicated team and infrastructure. It's not something you want to be learning on the job.
发布评论
评论(3)
有了这么多数据,我认为您应该尝试研究 NoSQL(MongoDB、Cassandra、HBase 等)。使用 MySQL,您必须大幅扩展服务器。我们尝试每秒插入约 1200 次,但 MySQL 失败了(或硬件失败)。解决方案是使用 XCache(当时 memcached 也失败了)。尝试研究一下 NoSQL,您会喜欢它的。
With that amount of data I think you should try looking into NoSQL (MongoDB, Cassandra, HBase, etc.). With MySQL you have to scale your servers a lot. We tried doing ~1200 inserts/sec and MySQL failed (or hardware failed). The solution was using XCache (memcached failed at that time also). Try looking into NoSQL, you'll like it.
4B 行 x 30 x 4 字节约为每天 1/2 TB。我认为您无法将其保存在一台机器中,并且您的 SAN 可能也会遇到麻烦。我会关注 Cassandra,因为它是为高写入量而构建的。
4B rows x 30 x 4 bytes is about 1/2 terabyte a day. I don't think you're going to be able to keep this in a single machine and your SAN may have trouble too. I would look at Cassandra as it's built for high write volumes.
如果我是你,我会将解决方案分为数据捕获和数据分析服务器;这是一种相当常见的模式。查询和报告针对您的数据仓库运行(您可以在数据仓库中使用与数据收集系统不同的架构)。您可以使用 ETL(提取、转换、加载)过程将数据加载到数据仓库中,这在您的情况下可能非常简单。
至于如何支持每秒 70K 写入,我想说这远远超出了大多数 RDBMS 服务器的能力,除非您有专门的团队和基础设施。这不是你想在工作中学习的东西。
NoSQL 似乎是更好的选择。
If I were you, I'd separate the solution into data capture and data analysis servers; this is a fairly common pattern. Queries and reporting run against your data warehouse (where you may be able to use a different schema to your data collection system). You load data into your data warehouse using an ETL (extract, transform, load) process, which in your case could be very simple.
As to how you would support 70K writes per second - I'd say this is well beyond the capabilities of most RDBMS servers unless you have a dedicated team and infrastructure. It's not something you want to be learning on the job.
NoSQL seems a better match.