C# 中的链接二维矩阵
我需要在 C# 中实现此场景:
矩阵将是非常大,可能是 10000x10000 或更大。我将在层次聚类算法中使用它作为距离矩阵。在算法的每次迭代中,矩阵都应该更新(将 2 行连接成 1 行,将 2 列连接成 1 列)。如果我使用简单的 double[,] 或 double[][] 矩阵,此操作将非常“昂贵”。 请问,有人可以建议这个场景的 C# 实现吗?
I need to implement this scenario in C#:
The matrix will be very large, maybe 10000x10000 or larger. I will use this for distance matrix in hierarchical clustering algorithm. In every iteration of the algorithm the matrix should be updated (joining 2 rows into 1 and 2 columns into 1). If I use simple double[,] or double[][] matrix this operations will be very "expensive".
Please, can anyone suggest C# implementation of this scenario?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
你现在有算法吗?你说的贵是什么意思?内存贵还是时间贵?如果内存昂贵:在 C# 中您无能为力。但您可以考虑使用临时对象在数据库内执行计算。如果时间昂贵:您可以使用并行性来连接列和行。
但除此之外,我认为简单的 double[,] 数组是 c# 中最快且节省内存的方式,因为访问数组值是一个 o(1) 操作,并且数组的数量最少内存和管理开销(与列表和字典相比)。
Do you have a algorithm at the moment? And what do you mean by expensive? Memory or time expensive? If memory expensive: There is not much you can do in c#. But you can consider executing the calculation inside a database using temporary objects. If time expensive: You can use parallelism to join columns and rows.
But beside that I think a simple
double[,]
array is the fastest and memory sparing way you can get in c#, because accessing the array values is an o(1) operation and arrays have a least amount of memory and management overhead (compared to lists and dictionaries).如上所述,基本的 double[,] 将是 C# 中处理此问题的最有效方法。
请记住,C# 位于托管内存的顶部,因此与基本 C 之类的操作相比,您对低级(就内存而言)操作的细粒度控制较少。在 C# 中创建自己的对象来添加功能只会使用更多在这种情况下,内存可能会减慢算法的速度。
如果您尚未选择算法,CURE 似乎是一个不错的选择。算法的选择可能会影响您的数据结构选择,但这不太可能。
您会发现该算法无论如何决定了“成本”的理论限制。例如,您将了解到,对于 CURE,您受到 O(n2 log n) 运行时间和 O(n) 内存使用的限制。
我希望这有帮助。如果您能提供更多详细信息,我们也许可以为您提供更多帮助!
N。
As mentioned above, a basic double[,] is going to be the most effective way of handling this in C#.
Remember that C# sits of top of managed memory, and as such you have less fine grain control over low level (in terms of memory) operations in contrast to something like basic C. Creating your own objects in C# to add functionality will only use more memory in this scenario, and likely slow the algorithm down as well.
If you have yet to pick an algorithm, CURE seems to be a good bet. The choice of algorithm may affect your data structure choice, but that's not likely.
You will find that the algorithm determines the theoretical limits of 'cost' at any rate. For example you will read that for CURE, you are bound by a O(n2 log n) running time, and O(n) memory use.
I hope this helps. If you can provide more detail, we might be able to assist further!
N.
不可能“合并”两行或两列,您必须将整个矩阵复制到一个新的、更小的矩阵中,这确实昂贵得令人无法接受。
您可能应该将一行中的值添加到前一行,然后忽略这些值,就像删除它们一样。
数组的数组:double[][] 实际上比 double[,] 更快。但需要更多内存。
如果你稍微改变一下算法,整个数组合并的事情可能就不需要了,但这可能对你有帮助:
It's not possible to 'merge' two rows or two columns, you'd have to copy the whole matrix into a new, smaller one, which is indeed unacceptably expensive.
You should probably just add the values in one row to the previous and then ignore the values, acting like they where removed.
the arrays of arrays: double[][] is actually faster than double[,]. But takes more memory.
The whole array merging thing might not be needed if you change the algoritm a bit, but this might help u:
在此代码中,我使用两个一维帮助器列表来计算包含数据的大数组的索引。删除行/列非常便宜,因为我只需要从辅助列表中删除该索引。但当然,大数组中的内存仍然存在,即根据您的使用情况,您会出现内存泄漏。
In this code I use two 1D helper lists to calculate the index into a big array containing the data. Deleting rows/columns is really cheap since I only need to remove that index from the helper-lists. But of course the memory in the big array remains, i.e. depending on your usage you have a memory-leak.
嗯,对我来说这看起来像一个简单的二叉树。左节点代表行中的下一个值,右节点代表列。
因此,迭代行和列并将它们组合起来应该很容易。
Hm, to me this looks like a simple binary tree. The left node represents the next value in a row and the right node represents the column.
So it should be easy to iterate rows and columns and combine them.
谢谢您的回答。
目前我正在使用这个解决方案:
然后我正在构建节点之间的连接。之后矩阵就准备好了。
这将使用更多内存,但我认为添加行和列、连接行和列等操作会快得多。
Thank you for the answers.
At the moment I'm using this solution:
Then I'm building the connections between the nodes. After that the matrix is ready.
This will use more memory, but operations like adding rows and columns, joining rows and columns I think will be far more faster.