巨大多维矩阵的高效搜索
我正在寻找一种在巨大的多维矩阵中有效搜索数据的方法。
我的应用程序包含具有多个维度特征的数据。想象一下保存一家公司所有销售的数据(我的应用程序完全不同,但这只是为了演示问题)。每次销售的特征如下:
- 正在销售的产品
- 购买产品的客户 销售产品
- 的当天 销售
- 产品的员工
- 付款方式 销售
- 数量
我有数百万笔销售,涉及数千种产品,由数百名员工在很多天里进行。
我需要一种快速的方法来计算,例如:
- 员工在某一天销售的总数量、
- 客户购买的总数量、
- 通过信用卡支付的产品的总数量
- ……
我需要以最详细的方式存储数据方式,我可以使用一个映射,其中键是所有维度的总和,如下所示:
class Combination
{
Product *product;
Customer *customer;
Day *day;
Employee *employee;
Payment *payment;
};
std::map<Combination,quantity> data;
但由于我事先不知道执行哪些查询,所以我需要多个组合类(其中数据成员的顺序不同)或具有不同比较函数的映射(使用不同的序列进行排序)。
也许,可以通过为每个产品、客户……提供一个数字而不是指向它的指针来简化问题,但即便如此,我最终还是会拥有大量内存。
是否有任何数据结构可以帮助处理这种有效的搜索?
编辑:
只是为了澄清一些事情:在磁盘上,我的数据存储在数据库中,所以我不是在寻找改变这一点的方法。
问题是,为了执行复杂的数学计算,我将所有这些数据都存储在内存中,并且我需要一种有效的方法来搜索内存中的这些数据。
内存数据库有帮助吗?也许吧,但我担心内存数据库可能会对内存消耗和性能产生严重影响,所以我正在寻找更好的替代方案。
编辑(2):
更多说明:我的应用程序将对数据执行模拟,最终用户可以自由地将此数据保存或不保存到我的数据库中。所以数据本身一直在变化。在执行这些模拟和数据更改时,我需要按照前面的说明查询数据。
再说一次,简单地查询数据库并不是一个选择。我真的需要(复杂?)内存中的数据结构。
I am looking for a way to search in an efficient way for data in a huge multi-dimensional matrix.
My application contains data that is characterized by multiple dimensions. Imagine keeping data about all sales in a company (my application is totally different, but this is just to demonstrate the problem). Every sale is characterized by:
- the product that is being sold
- the customer that bought the product
- the day on which it has been sold
- the employee that sold the product
- the payment method
- the quantity sold
I have millions of sales, done on thousands of products, by hundreds of employees, on lots of days.
I need a fast way to calculate e.g.:
- the total quantity sold by an employee on a certain day
- the total quantity bought by a customer
- the total quantity of a product paid by credit card
- ...
I need to store the data in the most detailed way, and I could use a map where the key is the sum of all dimensions, like this:
class Combination
{
Product *product;
Customer *customer;
Day *day;
Employee *employee;
Payment *payment;
};
std::map<Combination,quantity> data;
But since I don't know beforehand which queries are performed, I need multiple combination classes (where the data members are in different order) or maps with different comparison functions (using a different sequence to sort on).
Possibly, the problem could be simplified by giving each product, customer, ... a number instead of a pointer to it, but even then I end up with lots of memory.
Are there any data structures that could help in handling this kind of efficient searches?
EDIT:
Just to clarify some things: On disk my data is stored in a database, so I'm not looking for ways to change this.
The problem is that to perform my complex mathematical calculations, I have all this data in memory, and I need an efficient way to search this data in memory.
Could an in-memory database help? Maybe, but I fear that an in-memory database might have a serious impact on memory consumption and on performance, so I'm looking for better alternatives.
EDIT (2):
Some more clarifications: my application will perform simulations on the data, and in the end the user is free to save this data or not into my database. So the data itself changes the whole time. While performing these simulations, and the data changes, I need to query the data as explained before.
So again, simply querying the database is not an option. I really need (complex?) in-memory data structures.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
编辑:替换之前的答案。
你能想象除了在巨大的结构数组上运行 qsort() 之外还有其他可能的选择吗?我没有其他办法看到。也许您可以在零时间对其进行一次排序,并在动态插入/删除条目时保持其排序。
EDIT: to replace earlier answer.
Can you imagine you have any other possible choice besides running qsort( ) on that giant array of structs? There's just no other way that I can see. Maybe you can sort it just once at time zero and keep it sorted as you do dynamic insertions/deletions of entries.
使用数据库(无论是否在内存中)来处理数据似乎是执行此操作的正确方法。
如果您不想这样做,则不必实现大量组合类,只需使用可以保存任何对象的集合即可。
Using a database (in-memory or not) to work with your data seems like the right way to do this.
If you don't want to do that, you don't have to implement lots of combination classes, just use a collection that can hold any of the objects.