使用内存数据库存储数百万个临时值的效率如何？

发布于 2024-09-27 15:41:19 字数 1125 浏览 9 评论 0 原文

我的应用程序当前存储数百万个 Double 元素用于计算。这些值在用于计算结束时运行的特定算法之前只是临时值。一旦完成此计算，数百万个值就可以被丢弃。

完整的故事位于此处，如果您需要更多信息细节。

提出的解决方案之一是使用内存数据库。

因此，如果我采用此解决方案，我将使用此数据库将我的值存储在表中以替换当前的 Map>，例如：（

create table CALCULATION_RESULTS_XXX (
  deal_id varchar2,
  values number
);

每个计算一个表， XXX是计算ID）

因此在计算过程中，我将执行以下操作：

当计算开始时，我创建CALCULATION_RESULTS_XXX表。
每次我需要添加一个值时，我都会在此表中插入一条记录。
在计算结束时，我将表格内容用于我的算法。
最后，我放弃了这张桌子。

正如另一个主题中所解释的，目前，我的计算可能会在内存中存储数百Mb的数据，如30 * 1,000,000个Double列表将需要大约240Mb。

现在的问题是：

如果我使用内存数据库，我的内存消耗会减少吗？
关于数据库使用（或表创建）、数据插入等，我需要注意哪些具体点？
我想我会选择H2数据库。您认为这是满足我需求的最佳选择吗？

原文

My application currently stores millions of Double elements for a calculation. These values are only temporary values before they are used for a specific algorithm that is run at the end of the calculation. Once this calculation is done, the millions of values can be discarded.

The full story is here, if you need more details.

One of the solutions that was proposed is to use an in-memory database.

So if I go with this solution, I will use this database to store my values in a table to replace my current Map<String, List<Double>>, like:

create table CALCULATION_RESULTS_XXX (
  deal_id varchar2,
  values number
);

(one table per calculation, XXX is the calculation ID)

So during the calculation, I will do the following:

When the calculation is started, I create the CALCULATION_RESULTS_XXX table.
Every time I need to add a value, I insert a record in this table.
At the end of the calculation, I use the table content for my algorithm.
Finally, I drop this table.

As explained in the other subject, currently, my calculation may store several hundreds of Mb of data in the memory, as a list of 30 * 1,000,000 of Double will need about 240Mb.

The questions now:

If I go with an in-memory database, does my memory consomption will be decreased?
What are the specific points that I will have to take care regarding the database usage (or table creation), the data insertion, etc. ?
I think I will choose H2 database. Do you think it's the best choice for my needs?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

若水微香 2024-10-04 15:41:19

由 Terracotta 备份的简单 HashMap 会做得更好，并且允许存储比 JVM 虚拟内存更大的集合。

嵌入式数据库，尤其是基于 SQL 的数据库，会增加代码的复杂性和开销，因此不值得。如果您确实需要具有随机访问功能的持久存储，请尝试 nosql DB 之一，例如 CouchDB、卡桑德拉，neo4j

回复收藏 0 原文

蓝颜夕 2024-10-04 15:41:19

这个问题非常简单，您确实需要尝试一下，看看（性能）结果如何。

您已经有了一个仅使用简单内存结构的实现。就个人而言，考虑到即使是戴尔最便宜的电脑也配备 1GB 以上的 RAM，您最好还是坚持使用。除此之外，插入一两个数据库应该相当简单。我会考虑 Sleepycat Berkerly DB（现在属于 Oracle...），因为您不需要使用 SQL 并且它们应该非常高效。（他们确实支持 Java）。

如果结果有希望，我会考虑进一步调查，但这实际上最多只需要几天的时间，包括基准测试。

回复收藏 0 原文

尤怨 2024-10-04 15:41:19

不知道会不会更快，所以你必须尝试一下。我确实想建议的是，当您不再立即需要该列表时，批量插入整个列表。不要逐个保存值:)

如果你的最终算法可以用 SQL 表达，那么也可能值得你花时间这样做，而不是重新加载所有列表。在任何情况下，不要放入类似值的索引或约束，并且最好也不允许 NULL（如果可能）。维护索引和约束会花费时间，并且允许 NULL 也会花费时间或产生开销。 deal_ids 当然可以（并且）被索引，因为它们是主键。

这不是很多，但至少比单个被否决的答案好:)