位数组有哪些常见用途?

发布于 2024-07-29 13:11:58 字数 83 浏览 5 评论 0 原文

我已经使用新手手册中的位数组完成了一个示例。 我想知道它们的用途以及它们的一些常见数据结构(假设“数组”是相当宽松的术语。)

谢谢。

I've done an example using bitarrays from a newbie manual. I want to know what they can be used for and what some common data structures for them (assuming that "array" is fairly loose terminology.)

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

成熟稳重的好男人 2024-08-05 13:11:58

应用程序 部分列出了一些 位数组 维基百科文章:

由于其紧凑性,位数组在空间或效率非常宝贵的领域有许多应用。 最常见的是,它们用于表示一组简单的布尔标志或布尔值的有序序列。

我们上面提到,位数组用于优先级队列,其中索引 k 处的位当且仅当 k 在队列中时才被设置; 例如,Linux 内核使用此数据结构,并从硬件中的“先查找零”操作中受益匪浅。

位数组可用于内存页、索引节点、磁盘扇区等的分配。在这种情况下,可以使用术语位图。 但是,该术语经常用于指代光栅图像,每个像素可能使用多个位。

位数组的另一个应用是布隆过滤器,这是一种概率集合数据结构,可以在很小的空间中存储大集合,以换取很小的错误概率。 还可以基于接受误报或漏报的位数组构建概率哈希表。

位数组及其上的操作对于构造简洁的数据结构也很重要,它使用接近最小可能的空间。 在这种情况下,查找第 n 个 1 位或计算直到某个位置的 1 位的数量等操作就变得很重要。

位数组也是检查压缩数据流的有用抽象,压缩数据流通常包含占用部分字节或未字节对齐的元素。 例如,单个 8 位字符的压缩霍夫曼编码表示的长度可以是 1 到 255 位之间的任意位置。

在信息检索中,位数组是非常频繁的术语的发布列表的良好表示。 如果我们计算严格递增整​​数列表中相邻值之间的间隙并使用一元编码对其进行编码,则当且仅当 n 在列表中时,结果是第 n 个位置有 1 位的位数组。 间隙为 n 的隐含概率为 1/2n。 这也是哥伦布编码的特例,参数M为1; 通常仅当 -log(2-p)/log(1-p) ≤ 1 时才选择此参数,或者粗略地说该术语出现在至少 38% 的文档中。

There are several listed in the Applications section of the Bit array Wikipedia article:

Because of their compactness, bit arrays have a number of applications in areas where space or efficiency is at a premium. Most commonly, they are used to represent a simple group of boolean flags or an ordered sequence of boolean values.

We mentioned above that bit arrays are used for priority queues, where the bit at index k is set if and only if k is in the queue; this data structure is used, for example, by the Linux kernel, and benefits strongly from a find-first-zero operation in hardware.

Bit arrays can be used for the allocation of memory pages, inodes, disk sectors, etc. In such cases, the term bitmap may be used. However, this term is frequently used to refer to raster images, which may use multiple bits per pixel.

Another application of bit arrays is the Bloom filter, a probabilistic set data structure that can store large sets in a small space in exchange for a small probability of error. It is also possible to build probabilistic hash tables based on bit arrays that accept either false positives or false negatives.

Bit arrays and the operations on them are also important for constructing succinct data structures, which use close to the minimum possible space. In this context, operations like finding the nth 1 bit or counting the number of 1 bits up to a certain position become important.

Bit arrays are also a useful abstraction for examining streams of compressed data, which often contain elements that occupy portions of bytes or are not byte-aligned. For example, the compressed Huffman coding representation of a single 8-bit character can be anywhere from 1 to 255 bits long.

In information retrieval, bit arrays are a good representation for the posting lists of very frequent terms. If we compute the gaps between adjacent values in a list of strictly increasing integers and encode them using unary coding, the result is a bit array with a 1 bit in the nth position if and only if n is in the list. The implied probability of a gap of n is 1/2n. This is also the special case of Golomb coding where the parameter M is 1; this parameter is only normally selected when -log(2-p)/log(1-p) ≤ 1, or roughly the term occurs in at least 38% of documents.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文