当前位置：文江博客话题详情

Java省时稀疏一维数组（双精度）

发布于 2024-08-14 09:11:38 字数 136 浏览 10 评论 0原文

我需要一个高效的 Java 结构来操作非常稀疏的双精度向量：基本的读/写操作。我用HashMap实现了但是访问太慢了。我应该使用其他数据结构吗？你有推荐免费的图书馆吗？

寻找一些和平的建议:)

非常感谢，

玛丽

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

江南烟雨〆相思醉 2024-08-21 09:11:38

HashMap 是正确的选择。应该不慢。通过分析器运行代码以查看所有时间都花在哪里，然后进行相应的优化。如果您需要优化代码的提示，请在此处发布示例，以便我们帮助解决特定问题。

[编辑] 根据索引的大小，您可以使用 Integer.valueOf(int) 中的技术来缓存装箱对象。但这仅在您创建大量地图并且索引在某种程度上有限的范围内时才有效。

或者您可以尝试来自 commons-lang 的 IntHashMap。使用起来有点困难（它是包私有的），但你可以复制代码。

最后，您可以使用自己的基于 int 的 HashMap 实现，并针对您的情况优化值查找。

回复收藏 0 原文

终难愈 2024-08-21 09:11:38

您的数据集有多大？比 Integer.MAX_VALUE 大很多吗？问题是 HashSet 由数组支持。碰撞会降低性能。也许不是hashmap的机制太慢，而是你有多次碰撞。也许如果您首先使用另一个哈希函数对数据进行分区（例如），然后将每个数据分区存储在它自己的哈希图中，您会更幸运。

回复收藏 0 原文

岁吢 2024-08-21 09:11:38

您可以从我的 Hapax 项目中复制粘贴稀疏向量： ch.akuhn.matrix.SparseVector

PS：所有其他不理解为什么使用地图太慢的答案和评论。它很慢，因为映射将所有索引装箱为整数对象！

这里提供的稀疏向量对于读取访问和附加值来说很快，但对于放置随机索引来说却不是。它最适合您首先创建 sprase 向量但将值按索引递增的顺序排列，然后主要使用地图进行读取的场景。

稀疏向量类中的重要方法是

// ...

public class SparseVector {

    /*default*/ int[] keys;
    /*default*/ int size, used;
    /*default*/ double[] values;

    public SparseVector(int size, int capacity) {
        assert size >= 0;
        assert capacity >= 0;
        this.size = size;
        this.keys = new int[capacity];
        this.values = new double[capacity];
    }

    public double get(int key) {
        if (key < 0 || key >= size) throw new IndexOutOfBoundsException(Integer.toString(key));
        int spot = Arrays.binarySearch(keys, 0, used, key);
        return spot < 0 ? 0 : values[spot];
    }

    public boolean isUsed(int key) {
        return 0 <= Arrays.binarySearch(keys, 0, used, key);
    }

    public double put(int key, double value) {
        if (key < 0 || key >= size) throw new IndexOutOfBoundsException(Integer.toString(key));
        int spot = Arrays.binarySearch(keys, 0, used, key);
        if (spot >= 0) return values[spot] = (float) value;
        else return update(-1 - spot, key, value);
    }

    public void resizeTo(int newSize) {
        if (newSize < this.size) throw new UnsupportedOperationException();
        this.size = newSize;
    }

    public int size() {
        return size;
    }

    private double update(int spot, int key, double value) {
        // grow if reaching end of capacity
        if (used == keys.length) {
            int capacity = (keys.length * 3) / 2 + 1;
            keys = Arrays.copyOf(keys, capacity);
            values = Arrays.copyOf(values, capacity);
        }
        // shift values if not appending
        if (spot < used) {
            System.arraycopy(keys, spot, keys, spot + 1, used - spot);
            System.arraycopy(values, spot, values, spot + 1, used - spot);
        }
        used++;
        keys[spot] = key;
        return values[spot] = (float) value;
    }

    public int used() {
        return used;
    }

    public void trim() {
        keys = Arrays.copyOf(keys, used);
        values = Arrays.copyOf(values, used);
    }

}

You can copy paste the sparse vector from my Hapax project: ch.akuhn.matrix.SparseVector

PS: to all those other answers and comments that dont grok why using a map is too slow. It is slow because a map boxes all indices to Integer objects!

The sparse vector presented here is fast for read access and appending values, but not for putting at random indices. Its is optimal for a scenario where you first create the sprase vector but putting values in order of increasing indices, and later use the map for reading mostly.

Important methods in the sparse vector class are

// ...

public class SparseVector {

    /*default*/ int[] keys;
    /*default*/ int size, used;
    /*default*/ double[] values;

    public SparseVector(int size, int capacity) {
        assert size >= 0;
        assert capacity >= 0;
        this.size = size;
        this.keys = new int[capacity];
        this.values = new double[capacity];
    }

    public double get(int key) {
        if (key < 0 || key >= size) throw new IndexOutOfBoundsException(Integer.toString(key));
        int spot = Arrays.binarySearch(keys, 0, used, key);
        return spot < 0 ? 0 : values[spot];
    }

    public boolean isUsed(int key) {
        return 0 <= Arrays.binarySearch(keys, 0, used, key);
    }

    public double put(int key, double value) {
        if (key < 0 || key >= size) throw new IndexOutOfBoundsException(Integer.toString(key));
        int spot = Arrays.binarySearch(keys, 0, used, key);
        if (spot >= 0) return values[spot] = (float) value;
        else return update(-1 - spot, key, value);
    }

    public void resizeTo(int newSize) {
        if (newSize < this.size) throw new UnsupportedOperationException();
        this.size = newSize;
    }

    public int size() {
        return size;
    }

    private double update(int spot, int key, double value) {
        // grow if reaching end of capacity
        if (used == keys.length) {
            int capacity = (keys.length * 3) / 2 + 1;
            keys = Arrays.copyOf(keys, capacity);
            values = Arrays.copyOf(values, capacity);
        }
        // shift values if not appending
        if (spot < used) {
            System.arraycopy(keys, spot, keys, spot + 1, used - spot);
            System.arraycopy(values, spot, values, spot + 1, used - spot);
        }
        used++;
        keys[spot] = key;
        return values[spot] = (float) value;
    }

    public int used() {
        return used;
    }

    public void trim() {
        keys = Arrays.copyOf(keys, used);
        values = Arrays.copyOf(values, used);
    }

}

回复收藏 0 原文