Java 中的稀疏矩阵/数组

发布于 2024-07-10 00:46:19 字数 416 浏览 5 评论 0原文

我正在开发一个用 Java 编写的项目,该项目要求我构建一个非常大的二维稀疏数组。 非常稀疏,如果这有什么区别的话。 无论如何:这个应用程序最重要的方面是时间效率(假设内存负载,虽然不是无限的以至于允许我使用标准的二维数组 - 关键范围在两个维度上都是数十亿) )。

在阵列的数千个单元中,将有数十万个单元包含一个对象。 我需要能够非常快速地修改单元格内容。

不管怎样:有谁知道一个特别好的库用于此目的吗? 它必须是 Berkeley、LGPL 或类似的许可证(没有 GPL,因为该产品不能完全开源)。 或者,如果有一种非常简单的方法来制作自制稀疏数组对象,那也很好。

我正在考虑 MTJ,但还没有听到任何关于其质量的意见。

I'm working on a project, written in Java, which requires that I build a very large 2-D sparse array. Very sparse, if that makes a difference. Anyway: the most crucial aspect for this application is efficency in terms of time (assume loads of memory, though not nearly so unlimited as to allow me to use a standard 2-D array -- the key range is in the billions in both dimensions).

Out of the kajillion cells in the array, there will be several hundred thousand cells which contain an object. I need to be able to modify cell contents VERY quickly.

Anyway: Does anyone know a particularly good library for this purpose? It would have to be Berkeley, LGPL or similar license (no GPL, as the product can't be entirely open-sourced). Or if there's just a very simple way to make a homebrew sparse array object, that'd be fine too.

I'm considering MTJ, but haven't heard any opinions on its quality.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

你与清晨阳光 2024-07-17 00:46:20

可能不是最快的运行时解决方案,但我能想到的最快的解决方案似乎可行。 创建一个 Index 类并将其用作 SortedMap 的键,例如:

    SortedMap<Index, Object> entries = new TreeMap<Index, Object>();
    entries.put(new Index(1, 4), "1-4");
    entries.put(new Index(5555555555l, 767777777777l), "5555555555l-767777777777l");
    System.out.println(entries.size());
    System.out.println(entries.get(new Index(1, 4)));
    System.out.println(entries.get(new Index(5555555555l, 767777777777l)));

我的 Index 类如下所示(在 Eclipse 代码生成器的帮助下)。

public static class Index implements Comparable<Index>
{
    private long x;
    private long y;

    public Index(long x, long y)
    {
        super();
        this.x = x;
        this.y = y;
    }

    public int compareTo(Index index)
    {
        long ix = index.x;
        if (ix == x)
        {
            long iy = index.y;
            if (iy == y)
            {
                return 0;
            }
            else if (iy < y)
            {
                return -1;
            }
            else
            {
                return 1;
            }
        }
        else if (ix < x)
        {
            return -1;
        }
        else
        {
            return 1;
        }
    }

    public int hashCode()
    {
        final int PRIME = 31;
        int result = 1;
        result = PRIME * result + (int) (x ^ (x >>> 32));
        result = PRIME * result + (int) (y ^ (y >>> 32));
        return result;
    }

    public boolean equals(Object obj)
    {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        final Index other = (Index) obj;
        if (x != other.x)
            return false;
        if (y != other.y)
            return false;
        return true;
    }

    public long getX()
    {
        return x;
    }

    public long getY()
    {
        return y;
    }
}

Not the fastest runtime solution probably, but the fastest I could come up with that seems to work. Create an Index class and use it as a key for a SortedMap, like:

    SortedMap<Index, Object> entries = new TreeMap<Index, Object>();
    entries.put(new Index(1, 4), "1-4");
    entries.put(new Index(5555555555l, 767777777777l), "5555555555l-767777777777l");
    System.out.println(entries.size());
    System.out.println(entries.get(new Index(1, 4)));
    System.out.println(entries.get(new Index(5555555555l, 767777777777l)));

My Index class looks like this (with some help from Eclipse code generator).

public static class Index implements Comparable<Index>
{
    private long x;
    private long y;

    public Index(long x, long y)
    {
        super();
        this.x = x;
        this.y = y;
    }

    public int compareTo(Index index)
    {
        long ix = index.x;
        if (ix == x)
        {
            long iy = index.y;
            if (iy == y)
            {
                return 0;
            }
            else if (iy < y)
            {
                return -1;
            }
            else
            {
                return 1;
            }
        }
        else if (ix < x)
        {
            return -1;
        }
        else
        {
            return 1;
        }
    }

    public int hashCode()
    {
        final int PRIME = 31;
        int result = 1;
        result = PRIME * result + (int) (x ^ (x >>> 32));
        result = PRIME * result + (int) (y ^ (y >>> 32));
        return result;
    }

    public boolean equals(Object obj)
    {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        final Index other = (Index) obj;
        if (x != other.x)
            return false;
        if (y != other.y)
            return false;
        return true;
    }

    public long getX()
    {
        return x;
    }

    public long getY()
    {
        return y;
    }
}
苦笑流年记忆 2024-07-17 00:46:20

您可以查看 la4j (Java 线性代数)库。 它支持CRS(压缩行存储)以及CCS(压缩列存储) 稀疏矩阵的内部表示。 因此,这些是稀疏数据最有效、最快速的内部结构。

以下是在 la4j 中使用稀疏矩阵的简短示例:

Matrix a = new CRSMatrix(new double[][]{ // 'a' - CRS sparse matrix
   { 1.0, 0.0, 3.0 },
   { 0.0, 5.0, 0.0 },
   { 7.0, 0.0. 9.0 }
});

Matrix b = a.transpose(); // 'b' - CRS sparse matrix

Matrix c = b.multiply(a, Matrices.CCS_FACTORY); // 'c' = 'b' * 'a'; 
                                                // 'c' - CCS sparse matrix

You migth look at la4j (Linear Algebra for Java) library. It supports CRS (Compressed Row Storage) as well as CCS (Compressed Column Storage) internal representaions for sparse matrices. So, these are the most efficient and fast internal stuctures for sparse data.

Here is the brief example of using sparse matrices in la4j:

Matrix a = new CRSMatrix(new double[][]{ // 'a' - CRS sparse matrix
   { 1.0, 0.0, 3.0 },
   { 0.0, 5.0, 0.0 },
   { 7.0, 0.0. 9.0 }
});

Matrix b = a.transpose(); // 'b' - CRS sparse matrix

Matrix c = b.multiply(a, Matrices.CCS_FACTORY); // 'c' = 'b' * 'a'; 
                                                // 'c' - CCS sparse matrix
赠我空喜 2024-07-17 00:46:20

您可以只使用嵌套映射,但如果您需要对其进行矩阵演算,这可能不是最好的选择,

 Map<Integer, Map<integer, Object>> matrix;

也许不是使用对象,而是使用一些元组作为实际数据,这样您就可以在提取后更轻松地使用它,例如:

class Tuple<T extends yourDataObject> {
  public final int x;
  public final int y;
  public final T object;
}

class Matrix {
  private final Map<Integer, Map<interger, Tupple>> data = new...;

 void add(int x, int y, Object object) {
     data.get(x).put(new Tupple(x,y,object);
 }
}


//etc

空检查为简洁起见省略等

you could just use a nested map although if you need to do matrix calculus on it that might not be the best option

 Map<Integer, Map<integer, Object>> matrix;

maybe instead of object use some tuple for the actual data so you can work with it easier after extraction, something like:

class Tuple<T extends yourDataObject> {
  public final int x;
  public final int y;
  public final T object;
}

class Matrix {
  private final Map<Integer, Map<interger, Tupple>> data = new...;

 void add(int x, int y, Object object) {
     data.get(x).put(new Tupple(x,y,object);
 }
}


//etc

null check etc omitted for brevity

御弟哥哥 2024-07-17 00:46:20

HashMap 很震撼。 只需使用 StringBuilder(而不是 + 或 String.format)将索引(作为字符串)与分隔符(如“/”)连接起来,并将其用作键。 没有比这更快、更高效的内存效率了。 稀疏矩阵已经是 20 世纪的了。 :-)

HashMap rocks. Just concatenate the indexes (as strings) with a separator, say '/', using StringBuilder (not + or String.format), and use that as the key. You can't get faster and more memory-efficient than that. Sparse matrices are soo 20th century. :-)

终止放荡 2024-07-17 00:46:19

使用哈希图构建的稀疏数组对于频繁读取的数据来说效率非常低。 最有效的实现使用 Trie,它允许访问分布有段的单个向量。

Trie 可以通过仅执行只读的 TWO 数组索引来计算表中是否存在元素,以获得存储元素的有效位置,或者了解其是否在底层存储中不存在。

它还可以在后备存储中为稀疏数组的默认值提供默认位置,这样您就不需要对返回的索引进行任何测试,因为 Trie 保证所有可能的源索引将至少映射到默认值后备存储中的位置(您经常会在其中存储零、空字符串或空对象)。

存在支持快速可更新 Tries 的实现,具有可选的“compact()”操作以在多个操作结束时优化后备存储的大小。 尝试比哈希图快得多,因为它们不需要任何复杂的哈希函数,也不需要处理读取冲突(使用哈希图,读取和写入都会发生冲突,这需要一个循环来跳到下一个候选位置,并对每个位置进行测试以比较有效的源索引...)

此外,Java Hashmap 只能对对象进行索引,并为每个哈希源索引创建一个 Integer 对象(此对象创建将需要每次读取(而不仅仅是写入)在内存操作方面都是昂贵的,因为它会给垃圾收集器带来压力。

我真的希望 JRE 包含一个 IntegerTrieMap。 作为慢速 HashMap的默认实现 或 LongTrieMap作为更慢的 HashMap的默认实现...但情况仍然不是这样。



你可能想知道什么是 Trie?

它只是一个小的整数数组(范围小于矩阵坐标的整个范围),允许将坐标映射到向量中的整数位置。

例如,假设您想要一个仅包含一些非零值的 1024*1024 矩阵。 您可能只想将其拆分为大小为 16*16 的子范围,而不是将该矩阵存储在包含 1024*1024 个元素(超过 100 万个)的数组中,而您只需要 64*64 个这样的子范围。

在这种情况下,Trie 索引将仅包含 64*64 个整数 (4096),并且至少有 16*16 个数据元素(包含默认零或稀疏矩阵中最常见的子范围)。

并且用于存储值的向量将仅包含彼此相等的子范围的 1 个副本(其中大多数都充满零,它们将由相同的子范围表示)。

因此,您可以使用如下语法,而不是使用像 matrix[i][j] 这样的语法:

trie.values[trie.subrangePositions[(i & ~15) + (j >> 4)] +
            ((i & 15) << 4) + (j & 15)]

使用 trie 对象的访问方法可以更方便地处理这种语法。

这是一个内置于注释类中的示例(我希望它编译正常,因为它被简化了;如果有错误需要纠正,请通知我):

/**
 * Implement a sparse matrix. Currently limited to a static size
 * (<code>SIZE_I</code>, <code>SIZE_I</code>).
 */
public class DoubleTrie {

    /* Matrix logical options */        
    public static final int SIZE_I = 1024;
    public static final int SIZE_J = 1024;
    public static final double DEFAULT_VALUE = 0.0;

    /* Internal splitting options */
    private static final int SUBRANGEBITS_I = 4;
    private static final int SUBRANGEBITS_J = 4;

    /* Internal derived splitting constants */
    private static final int SUBRANGE_I =
        1 << SUBRANGEBITS_I;
    private static final int SUBRANGE_J =
        1 << SUBRANGEBITS_J;
    private static final int SUBRANGEMASK_I =
        SUBRANGE_I - 1;
    private static final int SUBRANGEMASK_J =
        SUBRANGE_J - 1;
    private static final int SUBRANGE_POSITIONS =
        SUBRANGE_I * SUBRANGE_J;

    /* Internal derived default values for constructors */
    private static final int SUBRANGES_I =
        (SIZE_I + SUBRANGE_I - 1) / SUBRANGE_I;
    private static final int SUBRANGES_J =
        (SIZE_J + SUBRANGE_J - 1) / SUBRANGE_J;
    private static final int SUBRANGES =
        SUBRANGES_I * SUBRANGES_J;
    private static final int DEFAULT_POSITIONS[] =
        new int[SUBRANGES](0);
    private static final double DEFAULT_VALUES[] =
        new double[SUBRANGE_POSITIONS](DEFAULT_VALUE);

    /* Internal fast computations of the splitting subrange and offset. */
    private static final int subrangeOf(
            final int i, final int j) {
        return (i >> SUBRANGEBITS_I) * SUBRANGE_J +
               (j >> SUBRANGEBITS_J);
    }
    private static final int positionOffsetOf(
            final int i, final int j) {
        return (i & SUBRANGEMASK_I) * MAX_J +
               (j & SUBRANGEMASK_J);
    }

    /**
     * Utility missing in java.lang.System for arrays of comparable
     * component types, including all native types like double here.
     */
    public static final int arraycompare(
            final double[] values1, final int position1,
            final double[] values2, final int position2,
            final int length) {
        if (position1 >= 0 && position2 >= 0 && length >= 0) {
            while (length-- > 0) {
                double value1, value2;
                if ((value1 = values1[position1 + length]) !=
                    (value2 = values2[position2 + length])) {
                    /* Note: NaN values are different from everything including
                     * all Nan values; they are are also neigher lower than nor
                     * greater than everything including NaN. Note that the two
                     * infinite values, as well as denormal values, are exactly
                     * ordered and comparable with <, <=, ==, >=, >=, !=. Note
                     * that in comments below, infinite is considered "defined".
                     */
                    if (value1 < value2)
                        return -1;        /* defined < defined. */
                    if (value1 > value2)
                        return 1;         /* defined > defined. */
                    if (value1 == value2)
                        return 0;         /* defined == defined. */
                    /* One or both are NaN. */
                    if (value1 == value1) /* Is not a NaN? */
                        return -1;        /* defined < NaN. */
                    if (value2 == value2) /* Is not a NaN? */
                        return 1;         /* NaN > defined. */
                    /* Otherwise, both are NaN: check their precise bits in
                     * range 0x7FF0000000000001L..0x7FFFFFFFFFFFFFFFL
                     * including the canonical 0x7FF8000000000000L, or in
                     * range 0xFFF0000000000001L..0xFFFFFFFFFFFFFFFFL.
                     * Needed for sort stability only (NaNs are otherwise
                     * unordered).
                     */
                    long raw1, raw2;
                    if ((raw1 = Double.doubleToRawLongBits(value1)) !=
                        (raw2 = Double.doubleToRawLongBits(value2)))
                        return raw1 < raw2 ? -1 : 1;
                    /* Otherwise the NaN are strictly equal, continue. */
                }
            }
            return 0;
        }
        throw new ArrayIndexOutOfBoundsException(
                "The positions and length can't be negative");
    }

    /**
     * Utility shortcut for comparing ranges in the same array.
     */
    public static final int arraycompare(
            final double[] values,
            final int position1, final int position2,
            final int length) {
        return arraycompare(values, position1, values, position2, length);
    }

    /**
     * Utility missing in java.lang.System for arrays of equalizable
     * component types, including all native types like double here.
     */ 
    public static final boolean arrayequals(
            final double[] values1, final int position1,
            final double[] values2, final int position2,
            final int length) {
        return arraycompare(values1, position1, values2, position2, length) ==
            0;
    }

    /**
     * Utility shortcut for identifying ranges in the same array.
     */
    public static final boolean arrayequals(
            final double[] values,
            final int position1, final int position2,
            final int length) {
        return arrayequals(values, position1, values, position2, length);
    }

    /**
     * Utility shortcut for copying ranges in the same array.
     */
    public static final void arraycopy(
            final double[] values,
            final int srcPosition, final int dstPosition,
            final int length) {
        arraycopy(values, srcPosition, values, dstPosition, length);
    }

    /**
     * Utility shortcut for resizing an array, preserving values at start.
     */
    public static final double[] arraysetlength(
            double[] values,
            final int newLength) {
        final int oldLength =
            values.length < newLength ? values.length : newLength;
        System.arraycopy(values, 0, values = new double[newLength], 0,
            oldLength);
        return values;
    }

    /* Internal instance members. */
    private double values[];
    private int subrangePositions[];
    private bool isSharedValues;
    private bool isSharedSubrangePositions;

    /* Internal method. */
    private final reset(
            final double[] values,
            final int[] subrangePositions) {
        this.isSharedValues =
            (this.values = values) == DEFAULT_VALUES;
        this.isSharedsubrangePositions =
            (this.subrangePositions = subrangePositions) ==
                DEFAULT_POSITIONS;
    }

    /**
     * Reset the matrix to fill it with the same initial value.
     *
     * @param initialValue  The value to set in all cell positions.
     */
    public reset(final double initialValue = DEFAULT_VALUE) {
        reset(
            (initialValue == DEFAULT_VALUE) ? DEFAULT_VALUES :
                new double[SUBRANGE_POSITIONS](initialValue),
            DEFAULT_POSITIONS);
    }

    /**
     * Default constructor, using single default value.
     *
     * @param initialValue  Alternate default value to initialize all
     *                      positions in the matrix.
     */
    public DoubleTrie(final double initialValue = DEFAULT_VALUE) {
        this.reset(initialValue);
    }

    /**
     * This is a useful preinitialized instance containing the
     * DEFAULT_VALUE in all cells.
     */
    public static DoubleTrie DEFAULT_INSTANCE = new DoubleTrie();

    /**
     * Copy constructor. Note that the source trie may be immutable
     * or not; but this constructor will create a new mutable trie
     * even if the new trie initially shares some storage with its
     * source when that source also uses shared storage.
     */
    public DoubleTrie(final DoubleTrie source) {
        this.values = (this.isSharedValues =
            source.isSharedValues) ?
            source.values :
            source.values.clone();
        this.subrangePositions = (this.isSharedSubrangePositions =
            source.isSharedSubrangePositions) ?
            source.subrangePositions :
            source.subrangePositions.clone());
    }

    /**
     * Fast indexed getter.
     *
     * @param i  Row of position to set in the matrix.
     * @param j  Column of position to set in the matrix.
     * @return   The value stored in matrix at that position.
     */
    public double getAt(final int i, final int j) {
        return values[subrangePositions[subrangeOf(i, j)] +
                      positionOffsetOf(i, j)];
    }

    /**
     * Fast indexed setter.
     *
     * @param i      Row of position to set in the sparsed matrix.
     * @param j      Column of position to set in the sparsed matrix.
     * @param value  The value to set at this position.
     * @return       The passed value.
     * Note: this does not compact the sparsed matric after setting.
     * @see compact(void)
     */
    public double setAt(final int i, final int i, final double value) {
       final int subrange       = subrangeOf(i, j);
       final int positionOffset = positionOffsetOf(i, j);
       // Fast check to see if the assignment will change something.
       int subrangePosition, valuePosition;
       if (Double.compare(
               values[valuePosition =
                   (subrangePosition = subrangePositions[subrange]) +
                   positionOffset],
               value) != 0) {
               /* So we'll need to perform an effective assignment in values.
                * Check if the current subrange to assign is shared of not.
                * Note that we also include the DEFAULT_VALUES which may be
                * shared by several other (not tested) trie instances,
                * including those instanciated by the copy contructor. */
               if (isSharedValues) {
                   values = values.clone();
                   isSharedValues = false;
               }
               /* Scan all other subranges to check if the position in values
                * to assign is shared by another subrange. */
               for (int otherSubrange = subrangePositions.length;
                       --otherSubrange >= 0; ) {
                   if (otherSubrange != subrange)
                       continue; /* Ignore the target subrange. */
                   /* Note: the following test of range is safe with future
                    * interleaving of common subranges (TODO in compact()),
                    * even though, for now, subranges are sharing positions
                    * only between their common start and end position, so we
                    * could as well only perform the simpler test <code>
                    * (otherSubrangePosition == subrangePosition)</code>,
                    * instead of testing the two bounds of the positions
                    * interval of the other subrange. */
                   int otherSubrangePosition;
                   if ((otherSubrangePosition =
                           subrangePositions[otherSubrange]) >=
                           valuePosition &&
                           otherSubrangePosition + SUBRANGE_POSITIONS <
                           valuePosition) {
                       /* The target position is shared by some other
                        * subrange, we need to make it unique by cloning the
                        * subrange to a larger values vector, copying all the
                        * current subrange values at end of the new vector,
                        * before assigning the new value. This will require
                        * changing the position of the current subrange, but
                        * before doing that, we first need to check if the
                        * subrangePositions array itself is also shared
                        * between instances (including the DEFAULT_POSITIONS
                        * that should be preserved, and possible arrays
                        * shared by an external factory contructor whose
                        * source trie was declared immutable in a derived
                        * class). */
                       if (isSharedSubrangePositions) {
                           subrangePositions = subrangePositions.clone();
                           isSharedSubrangePositions = false;
                       }
                       /* TODO: no attempt is made to allocate less than a
                        * fully independant subrange, using possible
                        * interleaving: this would require scanning all
                        * other existing values to find a match for the
                        * modified subrange of values; but this could
                        * potentially leave positions (in the current subrange
                        * of values) unreferenced by any subrange, after the
                        * change of position for the current subrange. This
                        * scanning could be prohibitively long for each
                        * assignement, and for now it's assumed that compact()
                        * will be used later, after those assignements. */
                       values = setlengh(
                           values,
                           (subrangePositions[subrange] =
                            subrangePositions = values.length) +
                           SUBRANGE_POSITIONS);
                       valuePosition = subrangePositions + positionOffset;
                       break;
                   }
               }
               /* Now perform the effective assignment of the value. */
               values[valuePosition] = value;
           }
       }
       return value;
    }

    /**
     * Compact the storage of common subranges.
     * TODO: This is a simple implementation without interleaving, which
     * would offer a better data compression. However, interleaving with its
     * O(N²) complexity where N is the total length of values, should
     * be attempted only after this basic compression whose complexity is
     * O(n²) with n being SUBRANGE_POSITIIONS times smaller than N.
     */
    public void compact() {
        final int oldValuesLength = values.length;
        int newValuesLength = 0;
        for (int oldPosition = 0;
                 oldPosition < oldValuesLength;
                 oldPosition += SUBRANGE_POSITIONS) {
            int oldPosition = positions[subrange];
            bool commonSubrange = false;
            /* Scan values for possible common subranges. */
            for (int newPosition = newValuesLength;
                    (newPosition -= SUBRANGE_POSITIONS) >= 0; )
                if (arrayequals(values, newPosition, oldPosition,
                        SUBRANGE_POSITIONS)) {
                    commonSubrange = true;
                    /* Update the subrangePositions|] with all matching
                     * positions from oldPosition to newPosition. There may
                     * be several index to change, if the trie has already
                     * been compacted() before, and later reassigned. */
                    for (subrange = subrangePositions.length;
                         --subrange >= 0; )
                        if (subrangePositions[subrange] == oldPosition)
                            subrangePositions[subrange] = newPosition;
                    break;
                }
            if (!commonSubrange) {
                /* Move down the non-common values, if some previous
                 * subranges have been compressed when they were common.
                 */
                if (!commonSubrange && oldPosition != newValuesLength) {
                    arraycopy(values, oldPosition, newValuesLength,
                        SUBRANGE_POSITIONS);
                    /* Advance compressed values to preserve these new ones. */
                    newValuesLength += SUBRANGE_POSITIONS;
                }
            }
        }
        /* Check the number of compressed values. */
        if (newValuesLength < oldValuesLength) {
            values = values.arraysetlength(newValuesLength);
            isSharedValues = false;
        }
    }

}

注意:此代码不完整,因为它处理单个矩阵大小及其压缩器仅限于检测公共子范围,而不将它们交错。

此外,该代码不会根据矩阵大小确定将矩阵分割为子范围(对于 x 或 y 坐标)的最佳宽度或高度。 它仅使用相同的静态子范围大小 16(对于两个坐标),但它可以方便地使用任何其他小 2 的幂(但非 2 的幂会减慢 int indexOf(int, int) code> 和 int offsetOf(int, int) 内部方法),两个坐标独立,且最大为矩阵的最大宽度或高度。理想情况下为 compact()方法应该能够确定最佳的拟合尺寸。

如果这些分割子范围大小可以变化,则需要为这些子范围大小添加实例成员,而不是静态 SUBRANGE_POSITIONS,并创建静态方法 int subrangeOf(int i, int j)intpositionOffsetOf(int i, int j) 为非静态; 并且需要删除或重新定义初始化数组 DEFAULT_POSITIONSDEFAULT_VALUES

如果您想支持交错,基本上您将首先将现有值分成大约相同大小的两个(两者都是最小子范围大小的倍数,第一个子集可能比第二个子集多一个子范围),并且您将在所有连续位置扫描较大的一个,以找到匹配的交错; 然后你将尝试匹配这些值。 然后,您将通过将子集分成两半(也是最小子范围大小的倍数)来递归循环,并且您将再次扫描以匹配这些子集(这会将子集的数量乘以 2:您必须想知道是否加倍了subrangePositions 索引的大小与值的现有大小相比值得该值,以查看它是否提供有效的压缩(如果没有,您就到此为止:您已经直接从交错压缩过程中找到了最佳子范围大小)。情况;在压缩期间,子范围大小是可变的,

但此代码显示了如何分配非零值并为其他(非零)子范围重新分配 data 数组,以及如何优化。 (使用 setAt(int i, int j, double value) 方法执行赋值后使用 compact())当存在重复的子范围时此数据的存储可以在数据内统一,并在 subrangePositions 数组中的相同位置重新索引。

无论如何,trie 的所有原理都在那里实现:

  1. 使用单个向量而不是双索引数组数组(每个向量)来表示矩阵总是更快(并且内存更紧凑,意味着更好的局部性)单独分配)。 改进在 double getAt(int, int) 方法中可见!

  2. 您节省了大量空间,但在分配值时可能需要时间来重新分配新的子范围。 因此,子范围不应太小,否则设置矩阵时重新分配会过于频繁。

  3. 通过检测公共子范围,可以将初始大矩阵自动转换为更紧凑的矩阵。 典型的实现将包含诸如上面的 compact() 之类的方法。 然而,如果 get() 访问非常快并且 set() 相当快,如果有很多公共子范围需要压缩(例如,当用自身减去一个大的非稀疏随机填充矩阵时,compact() 可能会非常慢,或将其乘以零:在这种情况下,通过实例化一个新的特里树并删除旧的特里树来重置特里树会更简单、更快)。

  4. 公共子范围在数据中使用公共存储,因此该共享数据必须是只读的。 如果必须更改单个值而不更改矩阵的其余部分,则必须首先确保在 subrangePositions 索引中仅引用该值一次。 否则,您需要在 values 向量的任意位置(方便地在末尾)分配一个新子范围,然后将此新子范围的位置存储到 subrangePositions 索引中。< /p>



请注意,通用 Colt 库虽然非常好,但在处理稀疏矩阵时并不那么好,因为它使用散列(或行压缩)技术,目前不实现对尝试的支持,尽管它是一个出色的优化,既节省空间又节省时间,特别是对于最频繁的 getAt() 操作。

即使这里描述的tries的setAt()操作也节省了大量的时间(这里实现的方式就是设置后不自动压缩,仍然可以根据需求和预计时间来实现,其中压缩仍然可以节省大量存储空间)时间的代价):节省的时间与子范围中的单元格数量成正比,节省的空间与每个子范围的单元格数量成反比。 如果使用子范围大小,例如每个子范围的单元格数量是 2D 矩阵中单元格总数的平方根(在使用 3D 矩阵时它将是立方根),那么这是一个很好的折衷方案。

Colt 稀疏矩阵实现中使用的散列技术存在不便之处,即它们增加了大量存储开销,并且由于可能的冲突而减慢了访问时间。 尝试可以避免所有冲突,然后可以保证在最坏的情况下将线性 O(n) 时间节省到 O(1) 时间,其中 (n) 是可能的冲突数量(在稀疏矩阵的情况下,可能是最多为矩阵中非默认值单元的数量,即最多为矩阵大小的总数乘以与散列填充因子成比例的因子(对于非稀疏矩阵,即完整矩阵)。

Colt中使用的RC(行压缩)技术与Tries更接近,但这有另一个代价,这里使用的压缩技术,对于最频繁的只读get()操作来说访问时间非常慢,而且非常慢setAt() 操作的压缩。 此外,所使用的压缩不是正交的,这与在 Tries 演示中保留正交性不同。 还尝试为相关查看操作保留这种正交性,例如跨步、转置(被视为基于整数循环模运算的跨步操作)、子排列(以及一般的子选择,包括排序视图)。

我只是希望 Colt 在将来能够更新以使用 Tries 实现另一种实现(即 TrieSparseMatrix 而不仅仅是 HashSparseMatrix 和 RCSparseMatrix)。 这些想法在这篇文章中。

Trove 实现(基于 int->int 映射)也基于类似于 Colt 的 HashedSparseMatrix 的散列技术,即它们具有相同的不便之处。 尝试会快很多,消耗一定的额外空间(但是这个空间可以优化,甚至比 Trove 和 Colt 更好,在延迟的时间内,对结果矩阵/trie 使用最终的compact()ion 操作)。

注意:此 Trie 实现绑定到特定的本机类型(此处为 double)。 这是自愿的,因为使用装箱类型的通用实现具有巨大的空间开销(并且访问时间慢得多)。 这里它只使用原生的 double 一维数组而不是通用的 Vector。 但当然也可以为 Tries 派生出泛型实现...不幸的是,Java 仍然不允许编写具有本机类型所有优点的真正泛型类,除非编写多个实现(对于泛型对象类型或每个本机类型),并通过类型工厂提供所有这些操作。 该语言应该能够自动实例化本机实现并自动构建工厂(目前即使在 Java 7 中也不是这种情况,而 .Net 仍然保持其与本机一样快的泛型类型的优势)类型)。

Sparsed arrays built with hashmaps are very inefficient for frequently read data. The most efficient implementations uses a Trie that allows access to a single vector where segments are distributed.

A Trie can compute if an element is present in the table by performing only read-only TWO array indexing to get the effective position where an element is stored, or to know if its absent from the underlying store.

It can also provide a default position in the backing store for the default value of the sparsed array, so that you don't need ANY test on the returned index, because the Trie guarantees that all possible source index will map at least to the default position in the backing store (where you'll frequently store a zero, or an empty string or a null object).

There exists implementations that support fast-updatable Tries, with an otional "compact()" operation to optimze the size of the backing store at end of multiple operations. Tries are MUCH faster than hashmaps, because they don't need any complex hashing function, and don't need to handle collisions for reads (With Hashmaps, you have collision BOTH for reading and for writing, this requires a loop to skip to the next candidate position, and a test on each of them to compare the effective source index...)

In addition, Java Hashmaps can only index on Objects, and creating an Integer object for each hashed source index (this object creation will be needed for every read, not just writes) is costly in terms of memory operations, as it stresses the garbage collector.

I really hoped that the JRE included an IntegerTrieMap<Object> as the default implementation for the slow HashMap<Integer, Object> or LongTrieMap<Object> as the default implementation for the even slower HashMap<Long, Object>... But this is still not the case.



You may wonder what is a Trie?

It's just a small array of integers (in a smaller range than the full range of coordinates for your matrix) that allows mapping the coordinates into an integer position in a vector.

For example suppose you want a 1024*1024 matrix containing only a few non zero values. Instead of storing that matrix in a array containing 1024*1024 elements (more than 1 million), you may just want to split it into subranges of size 16*16, and you'll just need 64*64 such subranges.

In that case, the Trie index will contain just 64*64 integers (4096), and there will be at least 16*16 data elements (containing the default zeroes, or the most common subrange found in your sparse matrix).

And the vector used to store the values will contain only 1 copy for subranges that are equal with each other (most of them being full of zeroes, they will be represented by the same subrange).

So instead of using a syntax like matrix[i][j], you'd use a syntax like:

trie.values[trie.subrangePositions[(i & ~15) + (j >> 4)] +
            ((i & 15) << 4) + (j & 15)]

which will be more conveniently handled using an access method for the trie object.

Here is an example, built into a commented class (I hope it compiles OK, as it was simplified; signal me if there are errors to correct):

/**
 * Implement a sparse matrix. Currently limited to a static size
 * (<code>SIZE_I</code>, <code>SIZE_I</code>).
 */
public class DoubleTrie {

    /* Matrix logical options */        
    public static final int SIZE_I = 1024;
    public static final int SIZE_J = 1024;
    public static final double DEFAULT_VALUE = 0.0;

    /* Internal splitting options */
    private static final int SUBRANGEBITS_I = 4;
    private static final int SUBRANGEBITS_J = 4;

    /* Internal derived splitting constants */
    private static final int SUBRANGE_I =
        1 << SUBRANGEBITS_I;
    private static final int SUBRANGE_J =
        1 << SUBRANGEBITS_J;
    private static final int SUBRANGEMASK_I =
        SUBRANGE_I - 1;
    private static final int SUBRANGEMASK_J =
        SUBRANGE_J - 1;
    private static final int SUBRANGE_POSITIONS =
        SUBRANGE_I * SUBRANGE_J;

    /* Internal derived default values for constructors */
    private static final int SUBRANGES_I =
        (SIZE_I + SUBRANGE_I - 1) / SUBRANGE_I;
    private static final int SUBRANGES_J =
        (SIZE_J + SUBRANGE_J - 1) / SUBRANGE_J;
    private static final int SUBRANGES =
        SUBRANGES_I * SUBRANGES_J;
    private static final int DEFAULT_POSITIONS[] =
        new int[SUBRANGES](0);
    private static final double DEFAULT_VALUES[] =
        new double[SUBRANGE_POSITIONS](DEFAULT_VALUE);

    /* Internal fast computations of the splitting subrange and offset. */
    private static final int subrangeOf(
            final int i, final int j) {
        return (i >> SUBRANGEBITS_I) * SUBRANGE_J +
               (j >> SUBRANGEBITS_J);
    }
    private static final int positionOffsetOf(
            final int i, final int j) {
        return (i & SUBRANGEMASK_I) * MAX_J +
               (j & SUBRANGEMASK_J);
    }

    /**
     * Utility missing in java.lang.System for arrays of comparable
     * component types, including all native types like double here.
     */
    public static final int arraycompare(
            final double[] values1, final int position1,
            final double[] values2, final int position2,
            final int length) {
        if (position1 >= 0 && position2 >= 0 && length >= 0) {
            while (length-- > 0) {
                double value1, value2;
                if ((value1 = values1[position1 + length]) !=
                    (value2 = values2[position2 + length])) {
                    /* Note: NaN values are different from everything including
                     * all Nan values; they are are also neigher lower than nor
                     * greater than everything including NaN. Note that the two
                     * infinite values, as well as denormal values, are exactly
                     * ordered and comparable with <, <=, ==, >=, >=, !=. Note
                     * that in comments below, infinite is considered "defined".
                     */
                    if (value1 < value2)
                        return -1;        /* defined < defined. */
                    if (value1 > value2)
                        return 1;         /* defined > defined. */
                    if (value1 == value2)
                        return 0;         /* defined == defined. */
                    /* One or both are NaN. */
                    if (value1 == value1) /* Is not a NaN? */
                        return -1;        /* defined < NaN. */
                    if (value2 == value2) /* Is not a NaN? */
                        return 1;         /* NaN > defined. */
                    /* Otherwise, both are NaN: check their precise bits in
                     * range 0x7FF0000000000001L..0x7FFFFFFFFFFFFFFFL
                     * including the canonical 0x7FF8000000000000L, or in
                     * range 0xFFF0000000000001L..0xFFFFFFFFFFFFFFFFL.
                     * Needed for sort stability only (NaNs are otherwise
                     * unordered).
                     */
                    long raw1, raw2;
                    if ((raw1 = Double.doubleToRawLongBits(value1)) !=
                        (raw2 = Double.doubleToRawLongBits(value2)))
                        return raw1 < raw2 ? -1 : 1;
                    /* Otherwise the NaN are strictly equal, continue. */
                }
            }
            return 0;
        }
        throw new ArrayIndexOutOfBoundsException(
                "The positions and length can't be negative");
    }

    /**
     * Utility shortcut for comparing ranges in the same array.
     */
    public static final int arraycompare(
            final double[] values,
            final int position1, final int position2,
            final int length) {
        return arraycompare(values, position1, values, position2, length);
    }

    /**
     * Utility missing in java.lang.System for arrays of equalizable
     * component types, including all native types like double here.
     */ 
    public static final boolean arrayequals(
            final double[] values1, final int position1,
            final double[] values2, final int position2,
            final int length) {
        return arraycompare(values1, position1, values2, position2, length) ==
            0;
    }

    /**
     * Utility shortcut for identifying ranges in the same array.
     */
    public static final boolean arrayequals(
            final double[] values,
            final int position1, final int position2,
            final int length) {
        return arrayequals(values, position1, values, position2, length);
    }

    /**
     * Utility shortcut for copying ranges in the same array.
     */
    public static final void arraycopy(
            final double[] values,
            final int srcPosition, final int dstPosition,
            final int length) {
        arraycopy(values, srcPosition, values, dstPosition, length);
    }

    /**
     * Utility shortcut for resizing an array, preserving values at start.
     */
    public static final double[] arraysetlength(
            double[] values,
            final int newLength) {
        final int oldLength =
            values.length < newLength ? values.length : newLength;
        System.arraycopy(values, 0, values = new double[newLength], 0,
            oldLength);
        return values;
    }

    /* Internal instance members. */
    private double values[];
    private int subrangePositions[];
    private bool isSharedValues;
    private bool isSharedSubrangePositions;

    /* Internal method. */
    private final reset(
            final double[] values,
            final int[] subrangePositions) {
        this.isSharedValues =
            (this.values = values) == DEFAULT_VALUES;
        this.isSharedsubrangePositions =
            (this.subrangePositions = subrangePositions) ==
                DEFAULT_POSITIONS;
    }

    /**
     * Reset the matrix to fill it with the same initial value.
     *
     * @param initialValue  The value to set in all cell positions.
     */
    public reset(final double initialValue = DEFAULT_VALUE) {
        reset(
            (initialValue == DEFAULT_VALUE) ? DEFAULT_VALUES :
                new double[SUBRANGE_POSITIONS](initialValue),
            DEFAULT_POSITIONS);
    }

    /**
     * Default constructor, using single default value.
     *
     * @param initialValue  Alternate default value to initialize all
     *                      positions in the matrix.
     */
    public DoubleTrie(final double initialValue = DEFAULT_VALUE) {
        this.reset(initialValue);
    }

    /**
     * This is a useful preinitialized instance containing the
     * DEFAULT_VALUE in all cells.
     */
    public static DoubleTrie DEFAULT_INSTANCE = new DoubleTrie();

    /**
     * Copy constructor. Note that the source trie may be immutable
     * or not; but this constructor will create a new mutable trie
     * even if the new trie initially shares some storage with its
     * source when that source also uses shared storage.
     */
    public DoubleTrie(final DoubleTrie source) {
        this.values = (this.isSharedValues =
            source.isSharedValues) ?
            source.values :
            source.values.clone();
        this.subrangePositions = (this.isSharedSubrangePositions =
            source.isSharedSubrangePositions) ?
            source.subrangePositions :
            source.subrangePositions.clone());
    }

    /**
     * Fast indexed getter.
     *
     * @param i  Row of position to set in the matrix.
     * @param j  Column of position to set in the matrix.
     * @return   The value stored in matrix at that position.
     */
    public double getAt(final int i, final int j) {
        return values[subrangePositions[subrangeOf(i, j)] +
                      positionOffsetOf(i, j)];
    }

    /**
     * Fast indexed setter.
     *
     * @param i      Row of position to set in the sparsed matrix.
     * @param j      Column of position to set in the sparsed matrix.
     * @param value  The value to set at this position.
     * @return       The passed value.
     * Note: this does not compact the sparsed matric after setting.
     * @see compact(void)
     */
    public double setAt(final int i, final int i, final double value) {
       final int subrange       = subrangeOf(i, j);
       final int positionOffset = positionOffsetOf(i, j);
       // Fast check to see if the assignment will change something.
       int subrangePosition, valuePosition;
       if (Double.compare(
               values[valuePosition =
                   (subrangePosition = subrangePositions[subrange]) +
                   positionOffset],
               value) != 0) {
               /* So we'll need to perform an effective assignment in values.
                * Check if the current subrange to assign is shared of not.
                * Note that we also include the DEFAULT_VALUES which may be
                * shared by several other (not tested) trie instances,
                * including those instanciated by the copy contructor. */
               if (isSharedValues) {
                   values = values.clone();
                   isSharedValues = false;
               }
               /* Scan all other subranges to check if the position in values
                * to assign is shared by another subrange. */
               for (int otherSubrange = subrangePositions.length;
                       --otherSubrange >= 0; ) {
                   if (otherSubrange != subrange)
                       continue; /* Ignore the target subrange. */
                   /* Note: the following test of range is safe with future
                    * interleaving of common subranges (TODO in compact()),
                    * even though, for now, subranges are sharing positions
                    * only between their common start and end position, so we
                    * could as well only perform the simpler test <code>
                    * (otherSubrangePosition == subrangePosition)</code>,
                    * instead of testing the two bounds of the positions
                    * interval of the other subrange. */
                   int otherSubrangePosition;
                   if ((otherSubrangePosition =
                           subrangePositions[otherSubrange]) >=
                           valuePosition &&
                           otherSubrangePosition + SUBRANGE_POSITIONS <
                           valuePosition) {
                       /* The target position is shared by some other
                        * subrange, we need to make it unique by cloning the
                        * subrange to a larger values vector, copying all the
                        * current subrange values at end of the new vector,
                        * before assigning the new value. This will require
                        * changing the position of the current subrange, but
                        * before doing that, we first need to check if the
                        * subrangePositions array itself is also shared
                        * between instances (including the DEFAULT_POSITIONS
                        * that should be preserved, and possible arrays
                        * shared by an external factory contructor whose
                        * source trie was declared immutable in a derived
                        * class). */
                       if (isSharedSubrangePositions) {
                           subrangePositions = subrangePositions.clone();
                           isSharedSubrangePositions = false;
                       }
                       /* TODO: no attempt is made to allocate less than a
                        * fully independant subrange, using possible
                        * interleaving: this would require scanning all
                        * other existing values to find a match for the
                        * modified subrange of values; but this could
                        * potentially leave positions (in the current subrange
                        * of values) unreferenced by any subrange, after the
                        * change of position for the current subrange. This
                        * scanning could be prohibitively long for each
                        * assignement, and for now it's assumed that compact()
                        * will be used later, after those assignements. */
                       values = setlengh(
                           values,
                           (subrangePositions[subrange] =
                            subrangePositions = values.length) +
                           SUBRANGE_POSITIONS);
                       valuePosition = subrangePositions + positionOffset;
                       break;
                   }
               }
               /* Now perform the effective assignment of the value. */
               values[valuePosition] = value;
           }
       }
       return value;
    }

    /**
     * Compact the storage of common subranges.
     * TODO: This is a simple implementation without interleaving, which
     * would offer a better data compression. However, interleaving with its
     * O(N²) complexity where N is the total length of values, should
     * be attempted only after this basic compression whose complexity is
     * O(n²) with n being SUBRANGE_POSITIIONS times smaller than N.
     */
    public void compact() {
        final int oldValuesLength = values.length;
        int newValuesLength = 0;
        for (int oldPosition = 0;
                 oldPosition < oldValuesLength;
                 oldPosition += SUBRANGE_POSITIONS) {
            int oldPosition = positions[subrange];
            bool commonSubrange = false;
            /* Scan values for possible common subranges. */
            for (int newPosition = newValuesLength;
                    (newPosition -= SUBRANGE_POSITIONS) >= 0; )
                if (arrayequals(values, newPosition, oldPosition,
                        SUBRANGE_POSITIONS)) {
                    commonSubrange = true;
                    /* Update the subrangePositions|] with all matching
                     * positions from oldPosition to newPosition. There may
                     * be several index to change, if the trie has already
                     * been compacted() before, and later reassigned. */
                    for (subrange = subrangePositions.length;
                         --subrange >= 0; )
                        if (subrangePositions[subrange] == oldPosition)
                            subrangePositions[subrange] = newPosition;
                    break;
                }
            if (!commonSubrange) {
                /* Move down the non-common values, if some previous
                 * subranges have been compressed when they were common.
                 */
                if (!commonSubrange && oldPosition != newValuesLength) {
                    arraycopy(values, oldPosition, newValuesLength,
                        SUBRANGE_POSITIONS);
                    /* Advance compressed values to preserve these new ones. */
                    newValuesLength += SUBRANGE_POSITIONS;
                }
            }
        }
        /* Check the number of compressed values. */
        if (newValuesLength < oldValuesLength) {
            values = values.arraysetlength(newValuesLength);
            isSharedValues = false;
        }
    }

}

Note: this code is not complete because it handles a single matrix size, and its compactor is limited to detect only common subranges, without interleaving them.

Also, the code does not determine where it is the best width or height to use for splitting the matrix into subranges (for x or y coordinates), according to the matrix size. It just uses the same static subrange sizes of 16 (for both coordinates), but it could be conveniently any other small power of 2 (but a non power of 2 would slow down the int indexOf(int, int) and int offsetOf(int, int) internal methods), independantly for both coordinates, and up to the maximum width or height of the matrix.ideally the compact() method should be able to determine the best fitting sizes.

If these splitting subranges sizes can vary, then there will be a need to add instance members for these subrange sizes instead of the static SUBRANGE_POSITIONS, and to make the static methods int subrangeOf(int i, int j) and int positionOffsetOf(int i, int j) into non static; and the initialization arrays DEFAULT_POSITIONSand DEFAULT_VALUES will need to be dropped or redefined differently.

If you want to support interleaving, basically you'll start by dividing the existing values in two of about the same size (both being a multiple of the minimum subrange size, the first subset possibly having one more subrange than the second one), and you'll scan the larger one at all successive positions to find a matching interleaving; then you'll try to match these values. Then you'll loop recursely by dividing the subsets in halves (also a multiple of the minimum subrange size) and you'll scan again to match these subsets (this will multiply the number of subsets by 2: you have to wonder if the doubled size of the subrangePositions index is worth the value compared to the existing size of the values to see if it offers an effective compression (if not, you stop there: you have found the optimum subrange size directly from the interleaving compression process). In that case; the subrange size will be mutable, during compaction.

But this code shows how you assign non-zero values and reallocate the data array for additional (non-zero) subranges, and then how you can optimize (with compact() after assignments have been performed using the setAt(int i, int j, double value) method) the storage of this data when there are duplicate subranges that may be unified within the data, and reindexed at the same position in the subrangePositions array.

Anyway, all the principles of a trie are implemented there:

  1. It is always faster (and more compact in memory, meaning better locality) to represent a matrix using a single vector instead of a double-indexed array of arrays (each one allocated separately). The improvement is visible in the double getAt(int, int) method !

  2. You save a lot of space, but when assigning values it may take time to reallocate new subranges. For this reason, the subranges should not be too small or reallocations will occur too frequently for setting up your matrix.

  3. It is possible to transform an initial large matrix automatically into a more compact matrix by detecting common subranges. A typical implementation will then contain a method such as compact() above. However, if get() access is very fast and set() is quite fast, compact() may be very slow if there are lots of common subranges to compress (for example when substracting a large non-sparse randomly-filled matrix with itself, or multiplying it by zero: it will be simpler and much faster in that case to reset the trie by instanciating a new one and dropping the old one).

  4. Common subranges use common storage in the data, so this shared data must be read-only. If you must change a single value without changing the rest of the matrix, you must first make sure that it is referenced only one time in the subrangePositions index. Otherwise you'll need to allocate a new subrange anywhere (conveniently at end) of the values vector, and then store the position of this new subrange into the subrangePositions index.



Note that the generic Colt library, though very good as it is, is not as good when working on sparse matrice, because it uses hashing (or row-compresssed) technics which do not implement the support for tries for now, despite it is an excellent optimization, which is both space-saving and time-saving, notably for the most frequent getAt() operations.

Even the setAt() operation described here for tries saves lot of time (the way is is implemented here, i.e. without automatic compaction after setting, which could still be implemented based on demand and estimated time where compaction would still save lot of storage space at the price of time): the time saving is proportional to the number of cells in subranges, and space saving is inversely proportional to the number of cells per subrange. A good compromize if then to use a subrange size such the number of cells per subrange is the square root of the total number of cells in a 2D matrix (it would be a cubic root when working with 3D matrice).

Hashing technics used in Colt sparse matrix implementations have the inconvenience that they add a lot of storage overhead, and slow access time due to possible collisions. Tries can avoid all collisions, and can then warranty to save linear O(n) time to O(1) time in the worst cases, where (n) is the number of possible collisions (which, in case of sparse matrix, may be up to the number of non-default-value cells in the matrix, i.e. up to the total number of size of the matrix times a factor proportional to the hashing filling factor, for a non-sparse i.e. full matrix).

The RC (row-compressed) technics used in Colt are nearer from Tries, but this is at another price, here the compression technics used, that have very slow access time for the most frequent read-only get() operations, and very slow compression for setAt() operations. In addition, the compression used is not orthogonal, unlike in this presentation of Tries where orthogonality is preserved. Tries would also be preserve this orthogonality for related viewing operations such as striding, transposition (viewed as a striding operation based on integer cyclic modular operations), subranging (and subselections in general, including with sorting views).

I just hope that Colt will be updated in some future to implement another implementation using Tries (i.e. TrieSparseMatrix instead of just HashSparseMatrix and RCSparseMatrix). The ideas are in this article.

The Trove implementation (based in int->int maps) are also based on hashing technics similar to the Colt's HashedSparseMatrix, i.e. they have the same inconvenience. Tries will be a lot faster, with a moderate additional space consumed (but this space can be optimized and become even better than Trove and Colt, in a deferred time, using a final compact()ion operation on the resulting matrix/trie).

Note: this Trie implementation is bound to a specific native type (here double). This is voluntary, because generic implementation using boxing types have a huge space overhead (and are much slower in acccess time). Here it just uses native unidimensional arrays of double rather than generic Vector. But it is certainly possible to derive a generic implementation as well for Tries... Unfortunately, Java still does not allow writing really generic classes with all the benefits of native types, except by writing multiple implementations (for a generic Object type or for each native type), and serving all these operation via a type factory. The language should be able to automatically instanciate the native implementations and build the factory automatically (for now it is not the case even in Java 7, and this is something where .Net still maintains its advantage for really generic types that are as fast as native types).

陪你到最终 2024-07-17 00:46:19

以下测试 Java 矩阵库的框架还提供了一个很好的列表!
https://lessthanoptimal.github.io/Java-Matrix-Benchmark/

已测试图书馆:

* Colt
* Commons Math
* Efficient Java Matrix Library (EJML)
* Jama
* jblas
* JScience (Older benchmarks only)
* Matrix Toolkit Java (MTJ)
* OjAlgo
* Parallel Colt
* Universal Java Matrix Package (UJMP) 

Following framework to test Java Matrix Libraries, provides also a good list of these!
https://lessthanoptimal.github.io/Java-Matrix-Benchmark/

Tested Libraries:

* Colt
* Commons Math
* Efficient Java Matrix Library (EJML)
* Jama
* jblas
* JScience (Older benchmarks only)
* Matrix Toolkit Java (MTJ)
* OjAlgo
* Parallel Colt
* Universal Java Matrix Package (UJMP) 
或十年 2024-07-17 00:46:19

这似乎很简单。

您可以使用 row*maxcolums+column 作为索引来使用数据的二叉树。

要查找项目,您只需计算 row*maxcolums+column 并在树中进行二分搜索来查找它,如果不存在,您可以返回 null(它是 О(log n),其中 n 是包含对象的单元格数量)。

This seems to be simple.

You could use a binary tree of the data using row*maxcolums+column as an index.

To find item, you simply calculate row*maxcolums+column and binary search the tree looking for it, if it's not there, you can return null (it's О(log n) where n is the number of cells which contain an object).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文