以1分钟间隔存储历史库存数据的最佳方法是什么?

发布于 2024-10-29 19:01:17 字数 583 浏览 1 评论 0原文

我需要建立一个系统来存储以下数据:

  1. 10,000 只股票
  2. 对于每只股票,我应该保存 1000 个字段的数据(“开盘”、“最高”……)
  3. 每个字段每天更新 390 次(意思是,有 390 个间隔)
  4. 总的来说,我每个库存/字段/间隔总共有 10 年的数据

以下是插入/查询方面的要求:

  1. 快速插入数据 检索
  2. 通常如下:给我区间“Y”内字段“X”的所有股票。这样的查询必须尽可能快地检索

。就预算而言,由于我没有能力购买大型服务器和SQL-SERVER之类的东西来存储数据,所以一位朋友建议我研究MySQL。我尝试过,但是如果我不向表中添加任何索引,查询会非常慢。另一方面,如果我添加索引,插入速度会非常慢,所以这也没有帮助。 我的机器只有 2GB 内存,所以无论哪种方式,索引都无法容纳在内存中。

以可扩展的方式存储此类数据的最佳方法是什么(随着时间的推移,我可能会有更多的字段......)?考虑到我的要求和预算,基于字段/间隔的平面二进制文件是否是最佳解决方案?

(如果有什么区别的话,我使用 Linux)

I need to build a system to store the following data:

  1. 10,000 stocks
  2. For each stock, I should keep data for 1000 fields ("open", "high", .....)
  3. Each field gets updated 390 times in a day (meaning, there are 390 intervals)
  4. Overall, I have a total of 10 years of data for each stock/field/interval

Here are the requirements in terms of insertion/querying:

  1. Fast inserts of data as it comes in
  2. Retrievals would normally be as follows: give me all stocks for field 'X' on interval 'Y'. such a query must be retrieved as fast as possible

In terms of budget, since I do not have the means to buy a huge server and something like SQL-SERVER to store the data, a friend recommended that I look into MySQL. I tried it, but queries are extremely slow if I don't add any indices to the table. On the other hand, if I do add indices, the insertions are tremendously slow, so this does not help either.
My machine has only 2GB of memory in it, so either way, the indices will not fit in memory.

What is the best way to store such data in a scalable way (I might have even more fields as time goes by...)? is it true that flat binary files, on a by field/by interval basis, are the best solution given my requirements and budget?

(If it makes any difference at all, I use Linux)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

葬心 2024-11-05 19:01:17

如果您正在处理需要忠实表示并具有强大索引的数据,那么 MySQL 可能不是您想要的。我建议使用 PostgreSQL,它也是免费的,而且通常是一个全面的伟大项目(TM)。

平面二进制文件(甚至 ASCII)应该是一个不错的解决方案如果您不需要随后操作数据或进行复杂的连接。如果您必须编辑其中的数据平面文件中的当前位置,您有一个巨大的苦差事。如果您稍后需要使用平面文件添加字段,那么您的工作量就更大了。

PostgreSQL 合理地处理索引,并缓存查询以提高性能。如果您根据正在处理的 RAM 数量调整系统,那么索引编制应该不会构成太大的挑战。

我建议您避免使用平面文件来满足您的需求,如果即使使用适当调整的 RDBMS 实例查询仍然太慢,您就可以找到一种方法来减少需要处理的数据量。将每年的数据保存在单独的表中是一种简单而容易的方法,并且搜索整个数据库仍然可以通过联接轻松完成。


编辑:您可以做的另一件事是对数据表进行分区。这有各种各样的好处,比如让您将需要并行访问的数据移动到单独的驱动器,或者(再次)将单独的时间跨度放在不同的位置。更多信息:

http://www.postgresql.org/docs/8.2/ static/ddl-partitioning.html


编辑:有关为什么我建议避免使用 MySQL 的更多信息,请让我在这里指导您:http://code.openark.org/blog/mysql/but-i-do-want-mysql-to-say-error

MySQL is probably not what you want if you're dealing with data you need represented faithfully and with powerful indexing. I'd suggest PostgreSQL, which is also free and generally an all-around great project(TM).

Flat binary files (or even ASCII) should be a decent solution if you don't need to manipulate the data in place afterward or do complicated joins. If you're going to have to edit data in its current location in the flatfile, you have an enormous chore. If you need to add fields later with a flat file, you have a bigger chore.

PostgreSQL handles indexing reasonably, and caches queries for performance. Indexing shouldn't pose too much of a challenge if you tune your system for the quantity of RAM you're dealing with.

I'd suggest that you avoid flatfiles for your needs, and if queries are still too slow even with a properly tuned RDBMS instance that you find a way to shrink the amount of data that needs to be processed. Keeping each year of data in a separate table is a simplistic but easy method for doing this, and searching the whole database can still be easily accomplished with joins.


Edit: Another neat thing you can do is partition your data table. This has all kinds of benefits, like letting you move data that needs to be accessed in parallel to separate drives or (again) put individual spans of time in different places. More information:

http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html


Edit: For more on why I'd suggest avoiding MySQL, let me direct you here: http://code.openark.org/blog/mysql/but-i-do-want-mysql-to-say-error

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文