SparseArray 源码分析

发布于 2024-12-23 22:18:20 字数 8334 浏览 0 评论 0 收藏 0

SparseArray 映射键值对。区别于对象数组，索引可以是非连续的。与使用 HashMap 映射 Integer 类型的 key 相比，使用 SpaseArray 存储效率更高，一方面它避免了自动对 int 型键自动装箱，另一方面它的数据结构不依赖额外的 Map.Entry 对象。

注意到这个容器使用数组保持映射关系，使用二分搜索寻找键。这种实现不适合存储大量数据项。与传统的 HashMap 比起来要慢，因为使用二分搜索查找，增加和移除操作需要在数组中插入和删除元素。存储上百条数据，性能差距不明显，不到 50%.

为了提高性能，容器增加了移除 key 的优化：它不是立即压缩紧凑数组，而是将移除的 entry 标记为已删除。对于相同 key, 这条数据可以被重用，或者在一次 gc() 操作的过程中紧凑所有移除的数据。当数组增长时或集合容量增加，或者获取数据值的时候，gc() 操作就会在适当的时候执行。

使用 keyAt(int) , valueAt(int) 可以遍历容器中的数据。以升序索引值使用 keyAt(int) 遍历 key 将返回升序的 key, 使用升序 key 遍历将相应返回升序 value.

SparseArray, 即稀疏数组，与 HashMap 不同的是，没有实现 Map 接口，不属于集合框架。定义了类似于集合中常用接口，比如 get(int) , put(int, Object) , remove(int) , size() , clear() .

private static final Object DELETED = new Object();// 移除标记
private boolean mGarbage = false;// 执行 gc() 操作标记，当从数组中删除元素后置为 true

private int[] mKeys;// key 数组
private Object[] mValues;// value 数组
private int mSize;// 数组大小

构造函数创建了初始大小为 initialCapacity 的 key, value 数组。但是 capacity 的计算，SparseArray 与 SparseArrayCompat 调用了不同的实现。

static int binarySearch(int[] array, int size, int 	value) {
  int lo = 0;
  int hi = size - 1;

  while (lo <= hi) {
    final int mid = (lo + hi) >>> 1;
    final int midVal = array[mid];

    if (midVal < value) {
      lo = mid + 1;
    } else if (midVal > value) {
      hi = mid - 1;
    } else {
      return mid;  // value found
    }
  }
  return ~lo;  // value not present
}

基本上 SparseArray 的所有操作都是基于二分搜索，二分搜索的前提是数组是有序的，如果 value 在数组中存在，返回索引位置，否则最后计算出的 lo 值表示插入位置（0 <= low <= size），如果查找不到，取反返回负数。

/**
 * Removes the mapping from the specified key, if there was any.
 */
public void delete(int key) {
  int i =  ContainerHelpers.binarySearch(mKeys, mSize, key);

  if (i >= 0) {
    if (mValues[i] != DELETED) {
      mValues[i] = DELETED;
      mGarbage = true;
    }
  }
}

如果 key 存在的话，移除指定 key 的映射关系，将对应的 value 标记为 DELETED, mGarbage 标记设置为 true, 等待下一次操作触发 gc() 操作。

什么是 gc 操作？

private void gc() {
  // Log.e("SparseArray", "gc start with " + 	mSize);

  int n = mSize;
  int o = 0;
  int[] keys = mKeys;
  Object[] values = mValues;

  for (int i = 0; i < n; i++) {
    Object val = values[i];

    if (val != DELETED) {
      if (i != o) {
        keys[o] = keys[i];
        values[o] = val;
        values[i] = null;
      }

      o++;
    }
  }

  mGarbage = false;
  mSize = o;

  // Log.e("SparseArray", "gc end with " + mSize);
}

这里所谓的 gc 就是对数组进行整理压缩，使用两路指针 i 和 o，将未标记为 DELETED 的映射都向数组的左端移动，很像垃圾收集器的标记-整理算法。

那么哪些操作会触发调用 gc() 呢？

public void put(int key, E value)
public int size()
public int keyAt(int index)
public E valueAt(int index)
public void setValueAt(int index, E value)
public int indexOfKey(int key)
public int indexOfKey(int key)
public int indexOfValue(E value)
public void append(int key, E value)

所有的访问操作在 mGarbage 为 true 的时候都会调用 gc 操作，此外 put, append 操作在数组需要扩容的时候也会调用 gc 操作。

直接看 SparseArrayCompat 的 put 和 append 实现。


/**
 * Adds a mapping from the specified key to the specified value,
 * replacing the previous mapping from the specified key if there
 * was one.
 */
public void put(int key, E value) {
  int i =  ContainerHelpers.binarySearch(mKeys, mSize, key);
  // key 存在，更新 value
  if (i >= 0) {
    mValues[i] = value;
  } else {
    // 取反计算插入位置
    i = ~i;
    // 插入位置非尾部且原来映射已经不存在了，则更新 key-value
    if (i < mSize && mValues[i] == DELETED) {
      mKeys[i] = key;
      mValues[i] = value;
      return;
    }
    // 尝试 gc 一下扩容，重新计算索引
    if (mGarbage && mSize >= mKeys.length) {
      gc();

      // Search again because indices may have changed.
      i = ~ ContainerHelpers.binarySearch(mKeys, mSize, key);
    }
    // 扩容，每次扩容都是以 2 的幂次大小扩容，重新映射
    if (mSize >= mKeys.length) {
      int n =  ContainerHelpers.idealIntArraySize(mSize + 1);

      int[] nkeys = new int[n];
      Object[] nvalues = new Object[n];

      // Log.e("SparseArray", "grow " + mKeys.length + " to " + n);
      System.arraycopy(mKeys, 0, nkeys, 0, mKeys.length);
      System.arraycopy(mValues, 0, nvalues, 0, mValues.length);

      mKeys = nkeys;
      mValues = nvalues;
    }
    // 如果插入的位置不是尾部，需要向右移动插入位置后面的所有映射
    if (mSize - i != 0) {
      // Log.e("SparseArray", "move " + (mSize - i));
      System.arraycopy(mKeys, i, mKeys, i + 1, mSize - i);
      System.arraycopy(mValues, i, mValues, i + 1, mSize - i);
    }
    // 添加映射关系
    mKeys[i] = key;
    mValues[i] = value;
    mSize++;
  }
}

注意到插入映射的 key 保持了升序，因为每次 put 操作都会在已有的升序数组中计算插入的位置。

/**
 * Puts a key/value pair into the array, optimizing for the case where
 * the key is greater than all existing keys in the array.
 */
public void append(int key, E value) {
  // 不是在尾部增加映射
  if (mSize != 0 && key <= mKeys[mSize - 1]) {
    put(key, value);
    return;
  }

  if (mGarbage && mSize >= mKeys.length) {
    gc();
  }

  int pos = mSize;
  if (pos >= mKeys.length) {
    int n =  ContainerHelpers.idealIntArraySize(pos + 1);

    int[] nkeys = new int[n];
    Object[] nvalues = new Object[n];

    // Log.e("SparseArray", "grow " + mKeys.length + " to " + n);
    System.arraycopy(mKeys, 0, nkeys, 0, mKeys.length);
    System.arraycopy(mValues, 0, nvalues, 0, mValues.length);

    mKeys = nkeys;
    mValues = nvalues;
  }

  mKeys[pos] = key;
  mValues[pos] = value;
  mSize = pos + 1;
}

append 操作没什么特别的，只是针对要插入映射 key 大于数组中已存在的最大 key 的情况做了优化，减少了 gc 操作后重新计算索引的操作。

为什么在数据量小的时候 SparseArray 的综合性能相对于 HashMap 要好？

SparseArray 的主要优势是占用内存小，查询的时间复杂度是 O(logn), n 为数组的长度，而 HashMap 的时间复杂度为 O(n), n 为发生 hash 冲突的元素个数。比起来 HashMap 查询性能更优。HashMap 内部数据结构是一个 Map.Entry 数组，Map.Entry 是一个单链表数据结构，维护了键值对，指向下一个元素的引用和 key 的 hash 值。每个映射均增加了内存开销。当数据量大的时候，SparseArray gc 整理内存，put 插入映射都是 O(n) 的时间复杂度，put 操作还会造成数据迁移，而 HashMap 的时间复杂度为 O(1).

如何计算 capacity？

public static int idealByteArraySize(int need) {
  for (int i = 4; i < 32; i++)
    if (need <= (1 << i) - 12)
      return (1 << i) - 12;
  return need;
}

在扩容时计算新的容量大小使用的是上述方法，这个函数是为了方便内存分配，原则是给一个数组分配的内存空间最好是 2 的 n 次方大小，至于为什么要减去 12, 是因为普通对象需要额外 8 字节空间管理对象相关信息（对象头部)，数组另加 4 字节记录数组长度。