处理大量浮点矩阵
好吧,我实际上正在处理大量浮点矩阵的内存存储。这些矩阵存储统计数据,大多数时候,只有少数单元格包含非空值。
让我们考虑这个简单的问题。 项目随着时间的推移收集统计数据。这些统计数据存储在大约 30 个浮点条目的单行格式矩阵中。但对于某个项目,我们也有不同类型的统计数据。然后对于一个项目,我们可以定义这个简单的结构:
struct ItemStatistics
{
uint64_t item_id;
float * statistics_a;
...
float * statistics_z;
};
当应用程序(服务器)运行时,我收集数千个项目的一堆统计信息。然后,我们可以定义一个全局结构,将所有项目的应用程序统计信息存储为快速访问的映射:
typedef std::map<uint64_t, ItemStatistics*> StatisticsDb; // item_id <-> statistics
这种简单的表示方式并不高效地消耗内存,因为每个 statistics_x 对象都是大约 30 个条目的固定大小的数组。由于平均只收集 5 个值,因此矩阵大多数时候都是 10% 满的,有时甚至更少。
是否有一种内存有效的方法来存储这些数据?
有没有办法避免每个矩阵分配的 malloc 开销? (对于一百万个项目和 4 种统计数据,我们大约有 400 万次 malloc 操作,而不考虑 std::map 插入开销...)
Ok I'm actually working with in-memory storage of huge quantities of float matrices. These matrices stores statistics data and most of the time, just a few cells contain non-null value.
Lets consider this simple problem.
An item collects statistics over time. These statistics are stored in single-line format matrix of about 30 float entries. But we also have, for an item, different kind of statistics. Then for an item, we can define this simple structure:
struct ItemStatistics
{
uint64_t item_id;
float * statistics_a;
...
float * statistics_z;
};
While the application (server) is running, I collect a bunch of statistics for thousands of items. We then can define a global structure that stores application statistics for all our items as a map for fast access:
typedef std::map<uint64_t, ItemStatistics*> StatisticsDb; // item_id <-> statistics
This naive representation is not memory consumption efficient because every statistics_x object is a fix sized array of about 30 entries. As, in average, just 5 values are collected, the matrices are most of the time 10% full, sometimes less.
Is there a memory efficient way to store those data?
Is there a way to avoid malloc overhead for each matrix allocation? (For a million items and 4 kind of statistics, we have about 4 million malloc operations, without taking into account the std::map insert overhead...)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
也许 SparseLib++ 可能是您感兴趣的东西。看看它是否适合您的需求:http:// math.nist.gov/sparselib++/
Maybe SparseLib++ could be something you're interested in. Take a look and see if it suits your needs: http://math.nist.gov/sparselib++/