Java省时稀疏一维数组(双精度)

发布于 2024-08-14 09:11:38 字数 136 浏览 10 评论 0原文

我需要一个高效的 Java 结构来操作非常稀疏的双精度向量:基本的读/写操作。我用HashMap实现了但是访问太慢了。我应该使用其他数据结构吗?你有推荐免费的图书馆吗?

寻找一些和平的建议:)

非常感谢,

玛丽

I need an efficient Java structure to manipulate very sparse vectors of doubles: basic read / write operations. I implemented it in a HashMap but the access is too slow. Should I use another data structure? Do you recommend any free library?

Looking for some peaceful advice :)

Thanks a lot,

Marie

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

江南烟雨〆相思醉 2024-08-21 09:11:38

HashMap 是正确的选择。应该不慢。通过分析器运行代码以查看所有时间都花在哪里,然后进行相应的优化。如果您需要优化代码的提示,请在此处发布示例,以便我们帮助解决特定问题。

[编辑] 根据索引的大小,您可以使用 Integer.valueOf(int) 中的技术来缓存装箱对象。但这仅在您创建大量地图并且索引在某种程度上有限的范围内时才有效。

或者您可以尝试来自 commons-langIntHashMap。使用起来有点困难(它是包私有的),但你可以复制代码。

最后,您可以使用自己的基于 int 的 HashMap 实现,并针对您的情况优化值查找。

HashMap is the way to go. It shouldn't be slow. Run your code through a profiler to see where all the time goes and then optimize accordingly. If you need tips to optimize the code, post an example here so we can help with a specific issue.

[EDIT] Depending on the size of the indexes, you can use a technique as in Integer.valueOf(int) to cache the objects for boxing. But this will only work when you create lots of maps and the indexes are in a somewhat limited range.

Or you can try IntHashMap from commons-lang. It's a bit hard to use (it's package private) but you can copy the code.

Lastly, you could use your own implementation of an int-based HashMap with optimized value lookup for your case.

终难愈 2024-08-21 09:11:38

您的数据集有多大?比 Integer.MAX_VALUE 大很多吗?问题是 HashSet 由数组支持。碰撞会降低性能。也许不是hashmap的机制太慢,而是你有多次碰撞。也许如果您首先使用另一个哈希函数对数据进行分区(例如),然后将每个数据分区存储在它自己的哈希图中,您会更幸运。

How big is your dataset? Much larger than Integer.MAX_VALUE? the problem is that HashSet is backed by an array. Collisions will slow performance. Perhaps it's not the mechanism of hashmap that is too slow, but the fact that you have multiple collisions. Perhaps if you partitioned your data first (e.g) using another hash function, then stored each partition of data in it's own hashmap you'd have more luck.

岁吢 2024-08-21 09:11:38

您可以从我的 Hapax 项目中复制粘贴稀疏向量: ch.akuhn.matrix.SparseVector

PS:所有其他不理解为什么使用地图太慢的答案和评论。它很慢,因为映射将所有索引装箱为整数对象!

这里提供的稀疏向量对于读取访问和附加值来说很快,但对于放置随机索引来说却不是。它最适合您首先创建 sprase 向量但将值按索引递增的顺序排列,然后主要使用地图进行读取的场景。

稀疏向量类中的重要方法是

// ...

public class SparseVector {

    /*default*/ int[] keys;
    /*default*/ int size, used;
    /*default*/ double[] values;

    public SparseVector(int size, int capacity) {
        assert size >= 0;
        assert capacity >= 0;
        this.size = size;
        this.keys = new int[capacity];
        this.values = new double[capacity];
    }

    public double get(int key) {
        if (key < 0 || key >= size) throw new IndexOutOfBoundsException(Integer.toString(key));
        int spot = Arrays.binarySearch(keys, 0, used, key);
        return spot < 0 ? 0 : values[spot];
    }

    public boolean isUsed(int key) {
        return 0 <= Arrays.binarySearch(keys, 0, used, key);
    }

    public double put(int key, double value) {
        if (key < 0 || key >= size) throw new IndexOutOfBoundsException(Integer.toString(key));
        int spot = Arrays.binarySearch(keys, 0, used, key);
        if (spot >= 0) return values[spot] = (float) value;
        else return update(-1 - spot, key, value);
    }

    public void resizeTo(int newSize) {
        if (newSize < this.size) throw new UnsupportedOperationException();
        this.size = newSize;
    }

    public int size() {
        return size;
    }

    private double update(int spot, int key, double value) {
        // grow if reaching end of capacity
        if (used == keys.length) {
            int capacity = (keys.length * 3) / 2 + 1;
            keys = Arrays.copyOf(keys, capacity);
            values = Arrays.copyOf(values, capacity);
        }
        // shift values if not appending
        if (spot < used) {
            System.arraycopy(keys, spot, keys, spot + 1, used - spot);
            System.arraycopy(values, spot, values, spot + 1, used - spot);
        }
        used++;
        keys[spot] = key;
        return values[spot] = (float) value;
    }

    public int used() {
        return used;
    }

    public void trim() {
        keys = Arrays.copyOf(keys, used);
        values = Arrays.copyOf(values, used);
    }

}

You can copy paste the sparse vector from my Hapax project: ch.akuhn.matrix.SparseVector

PS: to all those other answers and comments that dont grok why using a map is too slow. It is slow because a map boxes all indices to Integer objects!

The sparse vector presented here is fast for read access and appending values, but not for putting at random indices. Its is optimal for a scenario where you first create the sprase vector but putting values in order of increasing indices, and later use the map for reading mostly.

Important methods in the sparse vector class are

// ...

public class SparseVector {

    /*default*/ int[] keys;
    /*default*/ int size, used;
    /*default*/ double[] values;

    public SparseVector(int size, int capacity) {
        assert size >= 0;
        assert capacity >= 0;
        this.size = size;
        this.keys = new int[capacity];
        this.values = new double[capacity];
    }

    public double get(int key) {
        if (key < 0 || key >= size) throw new IndexOutOfBoundsException(Integer.toString(key));
        int spot = Arrays.binarySearch(keys, 0, used, key);
        return spot < 0 ? 0 : values[spot];
    }

    public boolean isUsed(int key) {
        return 0 <= Arrays.binarySearch(keys, 0, used, key);
    }

    public double put(int key, double value) {
        if (key < 0 || key >= size) throw new IndexOutOfBoundsException(Integer.toString(key));
        int spot = Arrays.binarySearch(keys, 0, used, key);
        if (spot >= 0) return values[spot] = (float) value;
        else return update(-1 - spot, key, value);
    }

    public void resizeTo(int newSize) {
        if (newSize < this.size) throw new UnsupportedOperationException();
        this.size = newSize;
    }

    public int size() {
        return size;
    }

    private double update(int spot, int key, double value) {
        // grow if reaching end of capacity
        if (used == keys.length) {
            int capacity = (keys.length * 3) / 2 + 1;
            keys = Arrays.copyOf(keys, capacity);
            values = Arrays.copyOf(values, capacity);
        }
        // shift values if not appending
        if (spot < used) {
            System.arraycopy(keys, spot, keys, spot + 1, used - spot);
            System.arraycopy(values, spot, values, spot + 1, used - spot);
        }
        used++;
        keys[spot] = key;
        return values[spot] = (float) value;
    }

    public int used() {
        return used;
    }

    public void trim() {
        keys = Arrays.copyOf(keys, used);
        values = Arrays.copyOf(values, used);
    }

}
此生挚爱伱 2024-08-21 09:11:38

对于一维稀疏数组,映射通常是最佳选择。仅当库是多维的时才需要使用它。

如果比较映射和数组之间的访问时间,

   map.get(99);
   array[99];

映射会慢得多。任何图书馆都会有同样的问题。

这就是稀疏数组吗?你用时间换取空间。

For 1D sparse array, map is normally the way to go. You only need to use a library if it's multi-dimension.

If you compare access time between map and array,

   map.get(99);
   array[99];

map is going to be much slower. Any library would have the same issue.

Is that sparse array all about? You trade time for space.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文