如何在C++中创建20000*20000矩阵
我尝试计算一个有20000个点的问题,所以有一个有20000*20000个元素的距离矩阵,我如何在C++中存储这个矩阵?我在具有 4 GB RAM 的计算机上使用 Visual Studio 2008。任何建议将不胜感激。
I try to calculate a problem with 20000 points, so there is a distance matrix with 20000*20000 elements, how can I store this matrix in C++? I use Visual Studio 2008, on a computer with 4 GB of RAM. Any suggestion will be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
稀疏矩阵可能就是您所寻找的。许多问题并不是矩阵的每个单元格都有值。 SparseLib++ 是一个允许高效矩阵运算的库。
A sparse matrix may be what you looking for. Many problems don't have values in every cell of a matrix. SparseLib++ is a library which allows for effecient matrix operations.
避免您正在考虑的暴力方法,并尝试设想一种涉及填充的解决方案一个 20000 个元素的列表,而不是一个涵盖所有可能排列的数组。
对于初学者,请考虑以下简单的方法,根据您的问题的具体情况,您可以对其进行改进:
Avoid the brute force approach you're contemplating and try to envision a solution that involves populating a single 20000 element list, rather than an array that covers every possible permutation.
For starters, consider the following simplistic approach which you may be able to improve upon, given the specifics of your problem:
您可以将矩阵表示为单个大数组。这样做是否是个好主意由您决定。
如果每个单元需要四个字节,那么您的矩阵只有 4*20000*20000,即 1.6GB。任何平台都应该为单个进程提供足够的内存。 Windows 默认为 32 位进程提供 2GiB,如果需要更多,可以使用链接器选项。我尝试过的所有 32 位 unice 都超过 2.5GiB。
You can represent your matrix as a single large array. Whether it's a good idea to do so is for you to determine.
If you need four bytes per cell, your matrix is only 4*20000*20000, that is, 1.6GB. Any platform should give you that much memory for a single process. Windows gives you 2GiB by default for 32-bit processes -- and you can play with the linker options if you need more. All 32-bit unices I tried gave you more than 2.5GiB.
您需要内存中的矩阵吗?
根据您需要执行的计算的复杂性,您可以简单地使用一个函数来动态计算您的距离。如果您只使用其中的一些距离值,这甚至可能比预先计算单个距离值更快。
Is there a reason you need the matrix in memory?
Depending on the complexity of calculations you need to perform you could simply use a function that calculates your distances on the fly. This could even be faster than precalculating ever single distance value if you would only use some of them.
如果没有更多地参考手头的问题(以及矩阵的使用),您将得到很多答案......所以请纵容我。
这里的经典方法是使用稀疏矩阵,但是默认值可能类似于“未计算”,这需要特殊处理。
也许您可以使用缓存方法来代替。
显然,我会说您希望避免不断地重新计算距离,因此您希望将它们保留在这个巨大的矩阵中。但请注意,您始终可以重新计算它们。一般来说,我想说的是,尝试存储可以在加速时重新计算的值确实是缓存的意义所在。
所以我建议使用一个距离类来为您抽象缓存。
基本思想很简单:
实践是当然,有点复杂,特别是为了效率,并且由于尺寸有限,需要一个算法来选择这些元素等......
所以在我们深入研究技术实现之前,请告诉我这是否是您正在寻找的为了。
Without more references to the problem at hand (and the use of the matrix), you are going to get a lot of answers... so indulge me.
The classic approach here would be to go with a sparse matrix, however the default value would probably be something like 'not computed', which would require special handling.
Perhaps that you could use a caching approach instead.
Apparently I would say that you would like to avoid recomputing the distances on and on and so you'd like to keep them in this huge matrix. However note that you can always recompute them. In general, I would say that trying to store values that can be recomputed for a speed-off is really what caching is about.
So i would suggest using a distance class that abstract the caching for you.
The basic idea is simple:
The practice is a bit more complicated, of course, especially for efficiency and because of the limited size which requires an algorithm for the selection of those elements etc...
So before we delve in the technical implementation, just tell me if that's what you're looking for.
您的计算机应该能够处理 1.6 GB 的数据(假设为 32 位)
然后使用:
Your computer should be able to handle 1.6 GB of data (assuming 32bit)
And then use:
您可以(通过使用小数据类型),但您可能不想这样做。
最好使用四叉树(如果您需要找到最近的 N 个匹配项)或列表网格(如果您想找到 R 内的所有点)。
在物理学中,您可以用场或代表性点的合并来近似遥远的点。
总有解决办法。你有什么问题吗?
You can (by using small datatypes), but you probably don't want to.
You are better off using a quad tree (if you need to find the nearest N matches), or a grid of lists (if you want to find all points within R).
In physics, you can just approximate distant points with a field, or a representative amalgamation of points.
There's always a solution. What's your problem?
伙计,你应该避免 n2 问题...
将你的 20 000 个点放入体素网格中。
找到最接近的点对应该类似于 n log n。
Man you should avoid the n² problem...
Put your 20 000 points into a voxel grid.
Finding closest pair of points should then be something like n log n.
正如其他答案所述,您应该努力尝试使用稀疏矩阵或提出一种不需要同时在矩阵中包含所有数据的不同算法。
如果您确实需要它,也许是像stxxl 可能很有用,因为它是专门为大型数据集设计的。它几乎透明地为您处理交换。
As stated by other answers, you should try hard to either use sparse matrix or come up with a different algorithm that doesn't need to have all the data at once in the matrix.
If you really need it, maybe a library like stxxl might be useful, since it's specially designed for huge datasets. It handles the swapping for you almost transparently.
非常感谢您的回答。我正在做的是解决一个大约20000个节点的车辆路径问题。我需要一个距离矩阵,一个邻居列表矩阵(对于每个节点,根据距离列出所有其他节点)。该列表将经常用于查找谁可以成为候选人。我想有时如果我们可以在需要时进行计算,则可以省略距离矩阵。但邻居列表并不方便每次都创建。列表数据类型可以是 int。
To mgb:
64位Windows系统对这种情况有多大帮助?
Thanks a lot for your answers. What I am doing is to solve a vehicle routing problem with about 20000 nodes. I need one matrix for distance, one matrix for a neighbor list (for each node, list all other nodes according to the distance). This list will be used very often to find who can be some candidates. I guess sometimes distances matrix can be ommited if we can calculate when we need. But the neighbor list is not convenient to create every time. the list data type could be int.
To mgb:
how much can a 64 bit windows system help this situation?