如何高效存储如此大的数据量?数据库还是什么?
我必须做一个每秒检查 35 个项目的更改的应用程序。每个项目有 3 个值,每个值可容纳 5 个字节,因此项目有 15 个字节。值不会每秒改变,但没有模式,也许它们连续改变或者停滞一段时间......
所以我做了一个小计算,我得到了每秒将所有字段存储在关系数据库(SQL)上我将拥有:
35 * 15 字节 * 60 秒 * 60 分钟 * 24 小时 * 365 = 16.5 GB 一年。
这对于 SQL 数据库来说太多了。您会采取什么措施来减少数据大小?我正在考虑仅在发生更改时才存储数据,但随后您需要在更改完成时进行存储,如果数据更改过于频繁,则这可能比其他方法需要更多的空间。
我不知道除了 SQL 数据库之外是否还有其他存储库更适合我的要求。
你怎么认为?
编辑:更多信息。
除了我可以创建以节省空间的数据之外,数据之间没有任何关系。我只需要存储这些数据并查询它。数据可以看起来像(将其全部放在一个表中并每秒保存数据):
Timestamp Item1A Item1B Item1C Item2A Item2B ....
whatever 1.33 2.33 1.04 12.22 1.22
whatever 1.73 2.33 1.04 12.23 1.32
whatever 1.23 2.33 1.34 12.22 1.22
whatever 1.33 2.31 1.04 12.22 1.21
我觉得这一定是更好的解决方案而不是这种方法...
编辑2:
我通常会查询有关项目值的数据随着时间的推移,通常我不会查询多个 Item 的数据......
I have to do an application that will check the changes of 35 items each second. Each item have 3 values that will fit into 5 bytes each one, so 15 bytes for item. Values will not change each second but there isn't a pattern, maybe they change continuously or they stall for a while ...
So I did a small calculation and I got that storing all the fields each second on a relational database (SQL) I will have:
35 * 15 bytes * 60 seconds * 60 minutes * 24 hours * 365 = 16.5 Gb a year.
This is too much for an SQL database. What would you do to reduce the size of the data? I was thinking on storing the data only when there is a change but then you need to store when the change was done and if the data changes too often this can require more space than the other approach.
I don't know if there are other repositories other than SQL databases that fit better with my requirements.
What do you think?
EDIT: More information.
There is no relation between data other than the one I could create to save space. I just need to store this data and query it. The data can look like (putting it all in one table and saving the data each second):
Timestamp Item1A Item1B Item1C Item2A Item2B ....
whatever 1.33 2.33 1.04 12.22 1.22
whatever 1.73 2.33 1.04 12.23 1.32
whatever 1.23 2.33 1.34 12.22 1.22
whatever 1.33 2.31 1.04 12.22 1.21
I can feel that must be better solutions rather than this aproach ...
EDIT 2:
I usually will query the data about the values of an Item over the time, usually I won't query data from more than one Item ...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
,什么时候开始变得太多了?
对于几乎所有 RDBMS 来说,这确实是微不足道的(每年约 17GB 的数据)。
MySQL 可以做到这一点,PostgreSQL、Firebird 和许多其他数据库也可以做到,但 Sqlite 之类的不行。我自己会选择 PostgreSQL。
如今,拥有数百 TB 数据的 SQL 数据库并不罕见,因此 17GB 确实没什么可考虑的。更不用说10年后170GB了(用当时的机器)。
即使用于其他数据和索引的存储空间达到每年 30GB,对于 SQL 数据库来说仍然可以。
编辑
考虑到您的结构,在我看来很可靠,您需要的最少的东西已经存在,并且没有您不需要的额外内容。
如果不使用弊大于利的技巧,你就不可能得到比这更好的结果。
Since when is it too much?
That's really peanuts for almost any RDBMS out there (~17GB of data every year).
MySQL can do it, so can PostgreSQL, Firebird and plenty others but not the likes of Sqlite. I'd pick PostgreSQL myself.
Having SQL databases with hundreds of TB of data is not that uncommon these days, so 17GB is nothing to think about, really. Let alone 170GB in 10 years (with the machines of the time).
Even if it gets to 30GB a year to account for other data and indexes, that's still OK for an SQL database.
Edit
Considering your structure, that looks to me solid, the minimal things that you need are already there and there are no extras that you don't need.
You can't get any better than that, without using tricks that have more disadvantages than advantages.
我目前正在考虑使用压缩文件而不是 SQL 数据库。我将根据我得到的信息不断更新帖子。
I'm currently considering using compressed files instead of SQL databases. I will keep the post upgraded with the info I get.