在 .NET 中存储稀疏矩阵的最佳方法

发布于 2024-07-17 08:30:14 字数 519 浏览 4 评论 0 原文

我们有一个存储稀疏矩阵的应用程序。 该矩阵的条目主要存在于矩阵主对角线周围。 我想知道是否有任何有效的算法(或现有的库)可以有效地处理这种稀疏矩阵? 优选地,这将是通用实现,其中每个矩阵条目可以是用户定义的类型。

编辑回应问题/答复:

当我说主要围绕主对角线时,我的意思是大多数矩阵的特征是大多数条目都聚集在主对角线之外,但可能有靠近对角线的零,并且有可能是远离对角线的非零值。 我想要对这里的“大多数”情况有效的东西。

我将用这个做什么? 我需要能够有效地访问行中的所有值或列中的所有值。 存储的值将是布尔值。 一个例子是:

  1. 对于一行中的所有 true 值,对于每个列,一个 true 出现在将列的所有条目设置为某个内容
  2. 对于一行中的所有 false 值,将条目设置为某个内容

这之前都是通过链表完成的,但是实施起来非常混乱。 我希望通过稀疏矩阵可以改进算法,但事实证明找到“正确”类型的稀疏矩阵算法很困难。

ps 感谢您到目前为止的回复

We have an application that stores a sparse matrix. This matrix has entries that mostly exist around the main diagonal of the matrix. I was wondering if there were any efficient algorithms (or existing libraries) that can efficiently handle sparse matrices of this kind? Preferably, this would be a generic implementation where each matrix entry can be a user-defined type.

Edit in response to a question/response:

When I say mostly around the main diagonal I mean that the characteristics of most of the matrices will be that most entries are clustered off of the main diagonal but there could be zeroes close to the diagonal and there could be non-zero values far out from the diagonal. I want something efficient for 'most' cases here.

What will I be using this for? I need to be able to have efficient access to all values in a row or all values in a column. The values stored would be Boolean values. An example would be:

  1. For all true values in a row, foreach column a true appears in set all the entries of the column to something
  2. For all false values in a row, set the entry to something

This was all done with linked lists previously but was very confusing to implement. I was hoping that with a sparse matrix I could improve the algorithm but finding the 'right' type of sparse matrix algorithm has proved difficult.

p.s. Thanks for the responses thus far

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

拥抱我好吗 2024-07-24 08:30:15

您可以使用基于单元格的 [row,col] 的索引。 由于数据位于对角线上,因此存储行索引和与数据关联的列索引的典型方法并不是最佳的。 以下是您可以用来执行此操作的一些代码:

    public class SparseMatrix<T>
    {
        public int Width { get; private set; }
        public int Height { get; private set; }
        public long Size { get; private set; }

        private Dictionary<long, T> _cells = new Dictionary<long, T>();

        public SparseMatrix(int w, int h)
        {
            this.Width = w;
            this.Height = h;
            this.Size = w * h;
        }

        public bool IsCellEmpty(int row, int col)
        {
            long index = row * Width + col;
            return _cells.ContainsKey(index);
        }

        public T this[int row, int col]
        {
            get
            {
                long index = row * Width + col;
                T result;
                _cells.TryGetValue(index, out result);
                return result;
            }
            set
            {
                long index = row * Width + col;
                _cells[index] = value;
            }
        }
    }

    static void Main()
    {
        var sm = new SparseMatrix<int>(512, 512);
        sm[42, 42] = 42;
        int val1 = sm[13, 13];
        int val2 = sm[42, 42];

        Console.WriteLine("VAL1 = " + val1); // prints out 0
        Console.WriteLine("VAL2 = " + val2); // prints out 42

        Console.ReadLine();
    }

请注意,当 T 是结构体时,您可能必须调用 IsCellEmpty,因为获取单元格的内容不会为 null,并且将具有该类型的默认值。 您还可以扩展代码,根据 Size 属性和 _cells.Count 提供快速的“SparseRatio”。

编辑:

好吧,如果您对速度感兴趣,您可以在空间与速度之间进行权衡。 与其只有一本字典,不如拥有三本! 它使您的空间增加了三倍,但它使您可以轻松地以任何方式进行枚举。 以下是一些新代码,表明:

    public class SparseMatrix<T>
    {
        public int Width { get; private set; }
        public int Height { get; private set; }
        public long MaxSize { get; private set; }
        public long Count { get { return _cells.Count; } }

        private Dictionary<long, T> _cells = new Dictionary<long, T>();

        private Dictionary<int, Dictionary<int, T>> _rows = 
            new Dictionary<int, Dictionary<int, T>>();

        private Dictionary<int, Dictionary<int, T>> _columns = 
            new Dictionary<int, Dictionary<int, T>>();

        public SparseMatrix(int w, int h)
        {
            this.Width = w;
            this.Height = h;
            this.MaxSize = w * h;
        }

        public bool IsCellEmpty(int row, int col)
        {
            long index = row * Width + col;
            return _cells.ContainsKey(index);
        }

        public T this[int row, int col]
        {
            get
            {
                long index = row * Width + col;
                T result;
                _cells.TryGetValue(index, out result);
                return result;
            }
            set
            {
                long index = row * Width + col;
                _cells[index] = value;

                UpdateValue(col, row, _columns, value);
                UpdateValue(row, col, _rows, value);
            }
        }

        private void UpdateValue(int index1, int index2, 
            Dictionary<int, Dictionary<int, T>> parent, T value)
        {
            Dictionary<int, T> dict;
            if (!parent.TryGetValue(index1, out dict))
            {
                parent[index2] = dict = new Dictionary<int, T>();
            }
            dict[index2] = value;
        }
    }

如果要迭代所有条目,请使用 _cells。 如果您想要给定列的所有行,请使用_columns。 如果您想要给定行中的所有列,请使用_rows

如果您想按排序顺序进行迭代,您可以开始将 LINQ 添加到混合中和/或使用带有封装条目的内部类的排序列表(该内部类必须存储行或列并实现 IComparableIComparable< T> 用于排序工作)。

You could use an index based on the [row,col] of the cell. Since the data is on a diagonal, the typical approach of storing the row index and the associated column indeces with data is not optimal. Here is some code you could use to do it:

    public class SparseMatrix<T>
    {
        public int Width { get; private set; }
        public int Height { get; private set; }
        public long Size { get; private set; }

        private Dictionary<long, T> _cells = new Dictionary<long, T>();

        public SparseMatrix(int w, int h)
        {
            this.Width = w;
            this.Height = h;
            this.Size = w * h;
        }

        public bool IsCellEmpty(int row, int col)
        {
            long index = row * Width + col;
            return _cells.ContainsKey(index);
        }

        public T this[int row, int col]
        {
            get
            {
                long index = row * Width + col;
                T result;
                _cells.TryGetValue(index, out result);
                return result;
            }
            set
            {
                long index = row * Width + col;
                _cells[index] = value;
            }
        }
    }

    static void Main()
    {
        var sm = new SparseMatrix<int>(512, 512);
        sm[42, 42] = 42;
        int val1 = sm[13, 13];
        int val2 = sm[42, 42];

        Console.WriteLine("VAL1 = " + val1); // prints out 0
        Console.WriteLine("VAL2 = " + val2); // prints out 42

        Console.ReadLine();
    }

Note that when T is a struct, you might have to call the IsCellEmpty since getting the contents of a cell will not be null and will have the default value for that type. You can also expand the code to give you a quick "SparseRatio" based on the Size property and _cells.Count.

EDIT:

Well, if you are interesting is speed, you can do the trade-off of space vs speed. Instead of having only one dictionary, have three! It triples your space, but it makes enumerating in any way you want real easy. Here is some new code that shows that:

    public class SparseMatrix<T>
    {
        public int Width { get; private set; }
        public int Height { get; private set; }
        public long MaxSize { get; private set; }
        public long Count { get { return _cells.Count; } }

        private Dictionary<long, T> _cells = new Dictionary<long, T>();

        private Dictionary<int, Dictionary<int, T>> _rows = 
            new Dictionary<int, Dictionary<int, T>>();

        private Dictionary<int, Dictionary<int, T>> _columns = 
            new Dictionary<int, Dictionary<int, T>>();

        public SparseMatrix(int w, int h)
        {
            this.Width = w;
            this.Height = h;
            this.MaxSize = w * h;
        }

        public bool IsCellEmpty(int row, int col)
        {
            long index = row * Width + col;
            return _cells.ContainsKey(index);
        }

        public T this[int row, int col]
        {
            get
            {
                long index = row * Width + col;
                T result;
                _cells.TryGetValue(index, out result);
                return result;
            }
            set
            {
                long index = row * Width + col;
                _cells[index] = value;

                UpdateValue(col, row, _columns, value);
                UpdateValue(row, col, _rows, value);
            }
        }

        private void UpdateValue(int index1, int index2, 
            Dictionary<int, Dictionary<int, T>> parent, T value)
        {
            Dictionary<int, T> dict;
            if (!parent.TryGetValue(index1, out dict))
            {
                parent[index2] = dict = new Dictionary<int, T>();
            }
            dict[index2] = value;
        }
    }

If you want to iterate over all the entries, use _cells. If you want all the rows for a given column use _columns. If you want all the columns in a given row use _rows.

If you want to iterate in sorted order, you can start to add LINQ into the mix and/or use a sorted list with an inner class that encapsulates an entry (which would have to store the row or column and implement IComparable<T> for sorting to work).

苦行僧 2024-07-24 08:30:15

我想 Dictionary> 就足够了。

I guess a Dictionary<int, Dictionary<int, object >> would suffice.

烏雲後面有陽光 2024-07-24 08:30:15

我没有使用过它,但是 Nmath Matrix 可以处理这些(不是免费的) 。

另外,.NET 的极端优化数值库(非免费)。

这是一个免费的:Math.NET 项目(具体来说MathNet.Numerics.LinearAlgebra.Sparse 命名空间)

I haven't used it, but Nmath Matrix handles these (not free).

Also, Extreme Optimization Numerical Libraries for .NET (not free).

Here's a free one: Math.NET Project (specifically MathNet.Numerics.LinearAlgebra.Sparse namespace)

七婞 2024-07-24 08:30:15

这里有两个问题:

  • “主要围绕主对角线”太模糊了。 如果元素位于带中,则使用带本身的带状存储,作为从主对角线偏移的向量。 如果元素随机分散在主对角线附近,则可以使用带状形式(带状形式中可能包含一些零),或者使用纯稀疏形式(仅存储元素及其在数组中的位置)。

  • 你会用这个矩阵做什么? 如果您的目标仅仅是高效存储,那么带状表单将是高效的,可以快速访问任何元素。 如果您将使用矩阵进行线性代数,但不超过矩阵向量乘法,那么带状形式仍然可以很好地工作。 如果您使用矩阵矩阵乘法或矩阵分解,其中填充会成为问题,那么纯稀疏形式可能更合适。 例如,两个带状矩阵的乘积将具有附加带,因此两个三对角矩阵的乘积将是五对角矩阵。 对于因式分解,重新排序有时有助于最大限度地减少填充。 (AMD 是一种选择,近似最小度排列,但还有其他方案。)

There are two questions here:

  • "Mostly around the main diagonal" is too vague. If the elements lie in bands, then use banded storage of the bands themselves, as vectors offset from the main diagonal. If the elements are scattered randomly in the vicinity of the main diagonal, then either use a banded form that may include some zeros in the bands, or use a pure sparse form that stores only the elements and their positions in the array.

  • What will you do with the matrix? If your goal is merely efficient storage, then a banded form will be efficient, with fast access to any element. If you will do linear algebra with the matrix, but never more than matrixvector multiplies, then the banded form will still work splendidly. If you work with matrixmatrix multiplies or matrix factorizations, where fill-in becomes a problem, then a pure sparse form may be more appropriate. For example, the product of two banded matrices will have additional bands, so the product of two tridiagonal matrices will be pentadiagonal. For a factorization, reorderings will sometimes be useful to minimize fill-in. (AMD is one choice, Approximate Minimum Degree permutation, but there are other schemes.)

热血少△年 2024-07-24 08:30:15

以下是常规数据结构架构的列表。 每种方法都有其优点和缺点,并且适用于出现稀疏矩阵的稍微不同类型的问题。 您可能希望在现有数据结构之上实现它们,例如 List<> 。 和词典<>。

Here's a list of general data structure schemas. Each has its advantages and disadvantages, and are suitable for slightly different kinds of problems where sparse matrices arise. You'd probably want to implement them on top of existing data structures, such as List<> and Dictionary<>.

空宴 2024-07-24 08:30:15

我认为这可以通过使用保存普通数组的类来完成,保存矩阵行之间应用的水平偏移并定义行的条带,例如有效条目的数量。 因此,对于仅定义对角线和两个相邻元素的大型矩阵,您将创建一个 3 * 行数的数组,并将 3 存储为条带宽度。 偏移量取决于矩阵的大小。

我不知道有任何免费软件已经做到了这一点。

I think this could be done by using a class holding plain array, saving the horizontal offset applied between matrix rows and defining stripe of a row, e.g. the number of valid entries. So for a large matrix where only the diagonal and two neighbor elements are defined you'd create an array of 3 * number of rows and store 3 as the stripe width. The offset depends on the size of the matrix.

I'm not aware of anything free which already does this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文