2d-sparse-bitmaps 中文文档教程

发布于 4年前 浏览 23 项目主页 更新于 3年前

2d-sparse-bitmaps

CI StatusDependenciesDev Dependencies

Node.js 10 或更高版本,无需依赖项。

允许灵活的后备存储选择,主要支持的是 Redis(通过 ioredis).

底层的分块实现是高效的; 下面的例子只需要 64 个字节来表示两个坐标,这两个坐标在对角线上彼此相距约 1,414,213 个单位:

const TwoD = require('2d-sparse-bitmaps');
const bitmap = new TwoD.SparseBitmap({ [TwoD.ChunkWidthKey]: 16 });

await bitmap.set('so-far-away', 0, 0);
await bitmap.set('so-far-away', 1e6, 1e6);

此外,数据结构的稀疏特性允许通过

Installation

$ npm install 2d-sparse-bitmaps

Examples

包含的测试基准测试工具 & cli 希望提供充足的使用示例,并且在 cli 的情况下,它是一种易于访问的交互式工作方式与图书馆。

Usage

所有坐标都必须无符号,这一限制可能会在未来的版本中移除。

Instantiation

使用默认的内存存储:

const TwoD = require('2d-sparse-bitmaps');
const bitmap = new TwoD.SparseBitmap();

使用 ioredis 实例:

const Redis = require('ioredis');
const TwoD = require('2d-sparse-bitmaps');

const rConn = new Redis();
const bitmap = new TwoD.SparseBitmap({ [TwoD.BackingStoreKey]: rConn });

Full options

Constant NameDescriptionDefaultRestrictions
ChunkWidthKeyThe width in bits of each chunk in the sparse bitmap128>= 8, must be a multiple of 8
KeyPrefixKeyThe string preprended to each key before being passed onto the backing store.sparse-bitmapnone
BackingStoreKeyThe backing store instance to be used.InMemoryStoreMust conform to the aforementioned interface.

每个块最多需要 (X / 8) * X 字节的存储空间,其中 X 是选择的块宽度。 因此,默认块宽度 128 需要每个块 2048 字节。

Backing store interface

任何后备存储都必须实现此接口:

getbit(key, bitPosition);
setbit(key, bitPosition, value);
getBuffer(key);

它可以选择实现 pipeline(),它必须返回一个实现上述接口的实例 plus exec()用于流水线执行; 此外,此实例的后备存储接口方法必须接受类型为 function (err, result) 的附加回调参数。

默认的 InMemoryStore 提供了一个示例实现 sans pipeline() 等。 阿尔。

Get:

无法在 pipelinedMutate() 上下文中调用。 没有管道这些调用的选项可用; 应该使用 inBounds() 执行大量搜索。

The bit at (x, y) in key:

const xySet = await bitmap.get(key, x, y);

Set:

The bit at (x, y) in key:

await bitmap.set(key, x, y);

Unset:

key(x, y) 处的位:

await bitmap.unset(key, x, y);

Get all set-bits in given bounds:

无法在 pipelinedMutate() 上下文中调用。 此方法执行其自己的内部流水线以尽可能提高性能。

在边界框定义中 - 这里命名为 bBox - from 是左上坐标,to 是右下坐标:

const bBox = {
  from: { x: …, y: … },
  to: { x: …, y: … }
};

const allInBounds = await bitmap.inBounds(key, bBox);

allInBounds 将是一个双元素列表(元组)的列表,其中 x 坐标是第一个值 ([0]) 和 y 是第二个 ([1])。

有第三个可选参数,strict,如果设置为true,将在返回之前剔除列表,只包含严格在给定边界框内的点; 否则,将返回与边界框相交的任何块内的点

Get a key-bound instance:

返回的实例与上面的接口相同除了所有方法不再接受key参数,因为tgey现在都绑定到key 特别是:

const occupiedBitmap = bitmap.boundToKey('occupied');
await occupiedBitmap.set(x, y);
const check = await occupiedBitmap.get(x, y);
await occupiedBitmap.unset(x, y);
const occupiedInBounds = await occupiedBitmap.inBounds({ … });

Pipelined mutation:

当大量和/或频繁地执行许多突变(set()unset())时,pipelineMutate() 方法应该用于提供一个上下文,在该上下文中,对这些修改器的任何调用 - 包括由 boundToKey() 生成的实例的方法 - 将被适当地流水线化,因此作为原子单元执行:

const bitmap = new TwoD.SparseBitmap({ [TwoD.BackingStoreKey]: new Redis() });
const keyBound = bitmap.boundToKey('foobar');

const scopeReturn = await bitmap.pipelinedMutate(async () => {
  for (…) {
    await bitmap.set('foobar', x, y);
  }

  for (…) {
    await keyBound.unset(x, y);
  }
});

作用域函数返回后,管道将被执行(结果被丢弃)并且作用域函数的结果由 pipelinedMutate() 返回。

当与不支持流水线的后备存储一起使用时,修改器会正常执行(在调用时针对存储),因此此方法可以与任何后备存储一起使用,而不管其流水线功能或缺乏流水线功能。

构建大型管道可能会导致显着的运行时内存压力,因此有可能导致内存不足的情况。 建议消费者创建合理大小的管道,尽管作者(还)没有提供关于什么是“合理大小”的指导。

SparseBitmap 的访问器方法 - get()inBounds() - 不能在管道上下文中调用!这样做是一个编程错误,将导致抛出异常。

Contributing

欢迎任何和所有贡献,该项目随时接受拉取请求。

Testing

完整的测试套件(包括 linting)通过以下方式运行:

$ npm test

如果 NODE_ENV 环境变量设置为 ci,将完全跳过所有针对 redis 的测试。

在针对 redis 运行测试时,假定主机是标准端口上的本地计算机。 此外:

  • REDIS_LOCAL_AUTH will be used as the connection password, if set.
  • REDIS_LOCAL_DB will select the redis DB to test within, if set.

利用 tape,每个单独的测试文件都可以单独执行:

$ node test/default-store.js
…
# ok

Authors

Benchmarks

以下基本基准在运行 Ubuntu 20LTS AMI(ID ami-03ceeb0f46ee38ce7)的 AWS EC2 类型 m5a.large(4 个 vCPU,16GB RAM)上执行。 后备存储 redis 实例和 benchmark 脚本都在此 VM 上运行。 所有 redis 快照 已禁用。 EC2 实例是专门为此用途创建的,在基准测试过程中没有其他工作负载。

版本细节:

  • Linux kernel 5.4.0-1020-aws
  • Node 14.6.0 (v8 8.4.371.19-node.12)
  • Redis 5.0.7 (636cde3b5c7a3923) (standalone)

基准运行包括三个不同乘数(1、3 和 5)的运行,每个 101 次迭代定义在 benchmark:: T.seq.main()

主要收获是流水线操作提供的性能显着提高。

根据这个有限的数据集,平均搜索速率为 ~1.2 Gbit/s (~150 MB/s),尽管它确实取决于搜索区域的稀疏程度(速率从~493 Mbit/s - 观察到 ~2 Gbit/s)。 如果不启用流水线,该速率将急剧 下降至约 183 Mbit/s(下降 6.5 倍)。

完整的数据集链接如下,用于后处理和分析数据的 Google 表格为">可在此处获取。

Raw data

License

麻省理工学院

2d-sparse-bitmaps

CI StatusDependenciesDev Dependencies

A two-dimensional sparse bitmap implementation for Node.js 10 or later, with no required dependencies.

Allows for flexible backing store choice, the primary supported being Redis (via ioredis).

The underlying chunked implementation is efficient; the following example needs only 64 bytes to represent two coordinates which are ~1,414,213 units distant each other on the diagonal:

const TwoD = require('2d-sparse-bitmaps');
const bitmap = new TwoD.SparseBitmap({ [TwoD.ChunkWidthKey]: 16 });

await bitmap.set('so-far-away', 0, 0);
await bitmap.set('so-far-away', 1e6, 1e6);

Additionally, the sparse nature of the data structure allows for performant queries within specified bounds via inBounds().

Installation

$ npm install 2d-sparse-bitmaps

Examples

The included tests, benchmarking tool & cli hopefully provide ample usage examples, and in the case of the cli an easily-accessibly interactive means of working with the library.

Usage

All coordinates must be unsigned, a limitation that may be removed in future releases.

Instantiation

With the default in-memory store:

const TwoD = require('2d-sparse-bitmaps');
const bitmap = new TwoD.SparseBitmap();

With an ioredis instance:

const Redis = require('ioredis');
const TwoD = require('2d-sparse-bitmaps');

const rConn = new Redis();
const bitmap = new TwoD.SparseBitmap({ [TwoD.BackingStoreKey]: rConn });

Full options

Constant NameDescriptionDefaultRestrictions
ChunkWidthKeyThe width in bits of each chunk in the sparse bitmap128>= 8, must be a multiple of 8
KeyPrefixKeyThe string preprended to each key before being passed onto the backing store.sparse-bitmapnone
BackingStoreKeyThe backing store instance to be used.InMemoryStoreMust conform to the aforementioned interface.

Each chunk requires up to (X / 8) * X bytes of storage, where X is the chosen chunk width. Accordingly, the default chunk width of 128 requires 2048 bytes per chunk.

Backing store interface

Any backing store must implement this interface:

getbit(key, bitPosition);
setbit(key, bitPosition, value);
getBuffer(key);

It may optionally implement pipeline(), which must return an instance implementing the aforementioned interface plus exec() for pipeline execution; additionally, the backing store interface methods of this instance must accept an additional callback argument of type function (err, result).

The default InMemoryStore provides an example implementation sans pipeline() et. al.

Get:

Cannot be called within pipelinedMutate() context. No option to pipeline these calls is available; high-volume searches should be performed with inBounds() instead.

The bit at (x, y) in key:

const xySet = await bitmap.get(key, x, y);

Set:

The bit at (x, y) in key:

await bitmap.set(key, x, y);

Unset:

The bit at (x, y) in key:

await bitmap.unset(key, x, y);

Get all set-bits in given bounds:

Cannot be called within pipelinedMutate() context. This method performs it's own internal pipelining to be as performant as possible.

Within the bounding box definition - here named bBox - from is the top-left coordinate and to is the bottom-right coordinate:

const bBox = {
  from: { x: …, y: … },
  to: { x: …, y: … }
};

const allInBounds = await bitmap.inBounds(key, bBox);

allInBounds will be a list of two-element lists (tuples), where the x coordinate is the first value ([0]) and y is the second ([1]).

Has a third optional parameter, strict, which if set to true will cull the list before return to only include points strictly within the given bounding box; otherwise, points within any chunk intersected by the bounding box will be returned.

Get a key-bound instance:

The returned instance has the same interace as above except that all methods no longer take the key argument, as tgey are all now bound to key specifically:

const occupiedBitmap = bitmap.boundToKey('occupied');
await occupiedBitmap.set(x, y);
const check = await occupiedBitmap.get(x, y);
await occupiedBitmap.unset(x, y);
const occupiedInBounds = await occupiedBitmap.inBounds({ … });

Pipelined mutation:

When executing many mutations (set() and unset()) in high volume and/or frequency, the pipelineMutate() method should be used to provide a context in which any calls to these mutators - including methods of instances produced by boundToKey() - will be pipelined appropriately and therefore executed as an atomic unit:

const bitmap = new TwoD.SparseBitmap({ [TwoD.BackingStoreKey]: new Redis() });
const keyBound = bitmap.boundToKey('foobar');

const scopeReturn = await bitmap.pipelinedMutate(async () => {
  for (…) {
    await bitmap.set('foobar', x, y);
  }

  for (…) {
    await keyBound.unset(x, y);
  }
});

Upon return of the scoped function, the pipeline will be executed (with the results being discarded) and the result of the scoped function returned by pipelinedMutate().

When used with a backing store that does not support pipelining, the mutators are executed normally (against the store at call-time), therefore this method may be used with any backing store regardless of its pipelining capability or lack thereof.

Building a large pipeline may cause significant runtime memory pressure and as such has the potential to cause out-of-memory conditions. Consumers are advised to create reasonably-sized pipelines, though the author does not (yet) provide guidance as to what qualifies "reasonably-sized".

The accessor methods of SparseBitmap - get() and inBounds() - cannot be called within a pipelined context! Doing so is a programming error and will result in an exception being thrown.

Contributing

Any and all contributions are welcome and the project is accepting of pull requests at all times.

Testing

The full test suite (including linting) is run via:

$ npm test

If the NODE_ENV environment variable is set to ci, all tests against redis will be skipped entirely.

When running the tests against redis, the host is assumed to be the local machine on the standard port. Additionally:

  • REDIS_LOCAL_AUTH will be used as the connection password, if set.
  • REDIS_LOCAL_DB will select the redis DB to test within, if set.

Utilizing tape, each individual test file can be executed in isolation:

$ node test/default-store.js
…
# ok

Authors

Benchmarks

The following rudimentary benchmarks were performed on an AWS EC2 type m5a.large (4 vCPUs, 16GB RAM) running the Ubuntu 20LTS AMI (ID ami-03ceeb0f46ee38ce7). Both the backing store redis instance and benchmark script were run on this VM. All redis snapshotting was disabled. The EC2 instance was created specifically for this use and had no other workloads during the benchmarking process.

Version particulars:

  • Linux kernel 5.4.0-1020-aws
  • Node 14.6.0 (v8 8.4.371.19-node.12)
  • Redis 5.0.7 (636cde3b5c7a3923) (standalone)

The benchmark run consisted three runs of differing multipliers (1, 3 & 5), each 101 iterations of the full sequence defined in benchmark::T.seq.main().

The primary takeaway is the significant increase in performance afforded by pipelined operations.

According to this limited dataset, the mean search rate is ~1.2 Gbit/s (~150 MB/s), though it does depend on how sparsely-populated the search area is (with rates from ~493 Mbit/s - ~2 Gbit/s observed). Without pipelining enabled, this rate drops drastically to ~183 Mbit/s (a 6.5x drop).

The full data set is linked below, and the Google Sheet used to post-process and analyze the data is available here.

Raw data

License

MIT

    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文