查找表大小减小
我有一个应用程序,我必须在其中存储数百万个整数,我必须将它们存储在查找表中,显然我无法在内存中存储如此多的数据,并且根据我的要求,我必须存储的数据量非常有限嵌入式系统中的数据,所以我的空间非常有限,所以我想向您询问我可以用来减少查找表的推荐方法。 我不能使用函数近似(例如神经网络),这些值需要位于表格中。 目前尚不清楚整数的范围。 当我说整数时,我指的是 32 位值。
基本上,这个想法是使用一些压缩方法来减少内存量,但不会损失很多精度。 这个东西需要在硬件中运行,所以计算开销不能很高。
在我的算法中,我必须访问表的一个值,对其进行一些操作,然后更新该值。 最后我应该拥有一个函数,我将索引传递给它,然后我得到一个值,然后我必须使用另一个函数在表中写入一个值。
我找到了一个名为tilecoding的,这个是基于在几个查找表上,有谁知道其他方法吗?
谢谢。
I have an application in which I have to store a couple of millions of integers, I have to store them in a Look up table, obviously I cannot store such amount of data in memory and in my requirements I am very limited I have to store the data in an embebedded system so I am very limited in the space, so I would like to ask you about recommended methods that I can use for the reduction of the look up table. I cannot use function approximation such as neural networks, the values needs to be in a table. The range of the integers is not known at the moment. When I say integers I mean a 32 bit value.
Basically the idea is use some copmpression method to reduce the amount of memory but without losing many precision. This thing needs to run in hardware so the computation overhead cannot be very high.
In my algorithm I have to access to one value of the table do some operations with it and after update the value. In the end what I should have is a function which I pass an index to it and then I get a value, and after I have to use another function to write a value in the table.
I found one called tile coding , this one is based on several look up tables, does anyone know any other method?.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我会查看您需要存储的数字类型,并提取其中许多数字的常见信息。 例如,如果它们紧密聚集,您可以取平均值,存储它,然后存储偏移量。 偏移量的位数将少于原始数字。 或者,如果它们或多或少均匀分布,您可以存储第一个数字,然后存储到下一个数字的偏移量。
了解查找数字的关键是什么会有所帮助。
I'd look at the types of numbers you need to store and pull out the information that's common for many of them. For example, if they're tightly clustered, you can take the mean, store it, and store the offsets. The offsets will have fewer bits than the original numbers. Or, if they're more or less uniformly distributed, you can store the first number and then store the offset to the next number.
It would help to know what your key is to look up the numbers.
我需要有关该问题的更多详细信息。 如果您无法存储整数的实际值,而是存储近似值,则意味着您将减少(丢弃)一些数据(细节),对吗? 我认为你正在寻找一种哈希,它本身就是一种艺术形式。 例如,假设您有 32 位值,一个哈希将采用 4 个字节并将它们异或在一起,这将产生一个 8 位值,将您的存储减少 4 倍,但也会减少原始数据的实际价值。 通常,您可以/会更进一步,也许只使用这 8 位中的一些,例如较低的 4 位,并进一步减少该值。
我认为我真正的问题是要么您需要数据,要么不需要,如果您需要数据,您需要压缩它或找到更多内存来存储它。 如果不这样做,则使用某种哈希来减少位数,直到达到可用于存储的内存量。
I need more detail on the problem. If you cannot store the real value of the integers but instead an approximation, that means you are going to reduce (throw away) some of the data (detail), correct? I think you are looking for a hash, which can be an artform in itself. For example say you have 32 bit values, one hash would be to take the 4 bytes and xor them together, this would result in a single 8 bit value, reducing your storage by a factor of 4 but also reducing the real value of original data. Typically you could/would go further and perhaps and only use a few of those 8 bits , say the lower 4 and reduce the value further.
I think my real problem is either you need the data or you dont, if you need the data you need to compress it or find more memory to store it. If you dont, then use a hash of some sort to reduce the number of bits until you reach the amount of memory you have for storage.
阅读http://www.cs.ualberta.ca/~sutton/RL -FAQ.html
也许这适用。另外,用额外的事实更新你的问题——不要仅仅在评论中回答。
编辑。
位数组可以轻松地为数百万个数字中的每一个存储一个位。假设你有数字 内,那么您可以为集合中的每个数字分配 1,而为不在您的集合中的每个数字分配 0。
在 1 到 8 百万的范围内,如果您的数字在 1 到 3200 万范围 ,您将需要 4Mb 内存来存储所有 32M 不同数字的大表。
请参阅我对 Python 中的现代高性能布隆过滤器? 用于无限大小的位数组的 Python 实现。
Read http://www.cs.ualberta.ca/~sutton/RL-FAQ.html
Perhaps that applies. Also, update your question with additional facts -- don't merely answer in the comments.
Edit.
A bit array can easily store a bit for each of your millions of numbers. Let's say you have numbers in the range of 1 to 8 million. In a single megabyte of storage you can have a 1 bit for each number in your set and a 0 for each number not in your set.
If you have numbers in the range of 1 to 32 million, you'll require 4Mb of memory for a big table of all 32M distinct numbers.
See my answer to Modern, high performance bloom filter in Python? for a Python implementation of a bit array of unlimited size.
如果您只是寻找有问题的数字是否存在 bloom 过滤器,可能是什么你正在寻找。 老实说,虽然你的问题相当模糊和令人困惑。 这将有助于解释什么是 Q 值,以及在表中找到它们后如何处理它们。
If you are merely looking for the presence of the number in question a bloom filter, might be what you are looking for. Honestly though your question is fairly vague and confusing. It would help to explain what Q values are, and what you do with them once you find them in the table.
如果您的整数集是同质的,那么您可以尝试使用哈希表,因为您可以使用一个技巧将存储的整数的大小(在您的情况下)减少一半。
假设整数n,因为它的集合是同质的,所以可以是散列。 假设您有 0x10000 (16k) 个存储桶。 每个桶索引,iBucket = n&FFFF。 桶中的每个项目只需要存储 16 位,因为前 16 位是桶索引。 为了保持数据较小,您必须做的另一件事是将项目数放入存储桶中,并使用数组来保存存储桶中的项目。 使用链表会太大而且速度慢。 当您迭代数组寻找匹配项时,请记住您只需要比较存储的 16 位。
因此假设一个桶是一个指向数组的指针和一个计数。 在 32 位系统上,最多为 64 位。 如果整数的数量足够小,我们也许可以做一些奇特的事情并使用 32 位作为存储桶。 16k * 8 字节 = 524k,200 万条 Shorts = 4mb。 因此,这为您提供了一种查找整数的方法和大约 40% 的压缩。
If your set of integers is homongenous, then you could try a hash table, because there is a trick you can use to cut the size of the stored integers, in your case, in half.
Assume the integer, n, because its set is homogenous can be the hash. Assume you have 0x10000 (16k) buckets. Each bucket index, iBucket = n&FFFF. Each item in a bucket need only store 16 bits, since the first 16 bits are the bucket index. The other thing you have to do to keep the data small is to put the count of items in the bucket, and use an array to hold the items in the bucket. Using a linked list will be too large and slow. When you iterate the array looking for a match, remember you only need to compare the 16 bits that are stored.
So assuming a bucket is a pointer to the array and a count. On a 32 bit system, this is 64 bits max. If the number of ints was small enough we might be able to do some fancy things and use 32 bits for a bucket. 16k * 8 bytes = 524k, 2 million shorts = 4mb. So this gets you a method to lookup the ints and about 40% compression.