为嵌入式应用程序选择 C 语言的唯一标识符
我目前正在尝试实现一种算法来选择唯一的 (16 位)标识符。 挑战在于以快速的方式做到这一点 不使用太多内存。 当前使用的标识符列表是 通过一系列扫描外部闪存设备来确定 因此 SPI 事务是一个相对较慢的过程。 还, 该算法将在小型微控制器上运行,所以我不能 实际上只是将所有条目读入 RAM 并在那里处理它们。
到目前为止我的想法是:
- 选择一个数字,然后浏览列表并查看它是否被使用。 冲洗 并重复。 速度相当慢(特别是如果有 有很多文件)。
- 如上所述,但使用伪随机数生成器选择数字 用合适的种子。 这样做的好处是比较少 很可能会有如此大量的迭代。
- 扫描列表并使用所有条目填充数组 成立。 对其进行排序,然后它就变得微不足道了。 这可以使用 巨大的内存量。
- 使用一个巨大的(好吧,巨大得可笑)位掩码。 并不真地 实际的。
- 接受工具的使用寿命,它会被扔掉 在将 65534 个文件写入闪存之前很久就已离开或“格式化”, 因此只需将迄今为止使用的最高值存储在闪存或备份存储器中并保留 递增。 老实说,这可能会非常有效 具体应用。
目前,我正准备使用第二个或第五个, 但我很想知道是否有人有其他想法。 我想 认为有一种与 CRC 形式类似的算法可用于 依次处理每个数字并给出尚未处理过的数字的大致想法 使用过,但我不知道这会如何工作。
I am currently trying to implement an algorithm to select a unique
(16-bit) identifier. The challenge is to do this in an fast way that
doesn't use too much memory. The list of currently used identifiers is
determined through scanning an external Flash device via a sequence of
SPI transactions and is therefore a relatively slow process. Also,
the algorithm will be running on a small-ish microcontroller, so I can't
really just read all the entries into RAM and process them there.
The thoughts I've had so far are:
- Pick a number, then scan through the list and see if it's used. Rinse
and repeat. Suffers from being rather slow (particularly if there
are a lot of files). - As above, but pick the number using a pseudo-random number generator
with an appropriate seed. This has the advantage that it's less
likely that there will be such large numbers of iterations. - Scan through the list and populate an array with all the entries
found. Sort it and then it becomes trivial. This could use an
enormous amount of memory. - Use an enormous (okay, ridiculously enormous) bit mask. Not really
practical. - Accept that the life-time of the tool is such that it will be thrown
away or 'formatted' long before it has written 65534 files to the Flash,
so just store the highest value used so far in the Flash or Backup memory and keep
incrementing. In all honesty, this would probably work quite well for this
specific application.
At the moment, I'm verging towards using either the second one or the fifth,
but I'd be interested to know if anyone has any other thoughts. I'd like to
think that there's an algorithm similar in form to a CRC that could be used to
process each number in turn and give a fair idea of a number that hasn't been
used, but I've no idea how this might work.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
我认为您在这里有一些选择,但还需要考虑一个 Bloom Filter。 这可能会出现误报(即您可能会排除某个 ID,即使该 ID 尚未使用过),但它可以让您选择专用于该数据的确切空间量。
I think you have some options here, but one more to consider is a Bloom Filter. This has a chance of false positives (i.e. you may rule out an ID as already used even though it hasn't been) but it can allow you to choose the exact amount of space you can dedicate to this data.
如果没有足够的 RAM 来实现足够大的位图以容纳 64K 条目,则可以通过为每次扫描使用较小的临时位图来减少通过闪存查找未使用 ID 的扫描次数。 如果位图中至少有一个未标记的位,则 16 字节位图可以在第一次扫描时记录 0-255 范围内找到的 ID,在第二次扫描时记录 256-511 范围内的 ID,依此类推。重做。 我相信这与随机起始范围的使用相结合会很好地发挥作用。
另一方面,如果我对选项 5 有很高的信心,我可能会选择它。
If there isn't enough RAM to implement a bitmap large enough for 64K entries, the number of scans through FLASH to find an unused ID could be reduced by using a smaller, temporary bitmap for each scan. A 16-byte bitmap could record found IDs in the range 0-255 on the first pass, 256-511 on the second scan, etc., at the end of each scan if there is at least one unmarked bit in the bitmap you're done. I believe this would work well in combination with the use of a random starting range.
On the other hand, if I had high confidence in option 5 I might just go with that.
我猜测由于提到了 SPI,FLASH 设备不可移除,但 IIRC SD 卡具有 SPI 访问模式,因此这可能不是真的。
如果闪存是永久性的,并且您有一个强大的、非易失性的地方来记住最后发布的 ID,那么这可能就是要做的事情。 它在运行时确实速度快且内存少。 它应该易于解释、实施和测试。
如果闪存是可拆卸的,那么使用伪随机数生成器并测试碰撞可能是可行的方法。 假设您的数量分布均匀,则很容易从使用的总数中预测发生碰撞的可能性。 只需选择一个具有相当长的重复间隔的生成器即可。 在模拟中对此进行模拟作为所选算法的验收测试可能是个好主意。
I'm guessing that the FLASH device is not removable due to the mention of SPI, but IIRC SD cards have an SPI access mode, so that may not be true.
If the FLASH is permanent and you have a robust, non-volatile place to remember the last ID issued, then that is probably the thing to do. It is certainly fast and low memory at run time. It should be easy to explain, implement and test.
If the FLASH is removable, then using a pseudo-random number generator and testing for collisions is probably the way to go. Assuming your numbers are well distributed, your chances of a collision are easy to predict from the total in use. Just pick a generator with a decently long repeat interval. It may be a good idea to mock this up in a simulation as an acceptance test for the selected algorithm.
我想知道为什么你不简单地存储最后一个 ID 并递增它。 你还有犹豫的理由吗? 你没有回答你的问题,只是一种普遍的不安。
如果出于安全原因需要 ID 具有一定的随机性,请使用随机数生成器并将生成器的当前寄存器值保存在闪存中。 这样,您可以为下一个 ID 加载它们,如果您仔细选择算法,则可以确保您获得完整的周期长度而不会重复。
[编辑] 由于您关心冲突,因此必须有一些可能发生冲突的数据,例如文件名等。 如果明显的方法(创建文件名并检查它是否存在)太慢并且“分配映射”中存在巨大间隙,则生成一个随机 ID 并检查它。 这应该允许您只需几次迭代即可找到未使用的 ID。
I'm wondering why you don't simply store the last ID and increment it. Is there a reason why you hesitate? You don't give one on your question, just a general uneasiness.
If you need the ID to be somewhat random for security reasons, then use a random number generator and save the current register values of the generator in the flash memory. This way, you can load them for the next ID which makes sure you'll get the full cycle length without repeats if you choose your algorithm carefully.
[EDIT] Since you're concerned with collisions, there must be some data where the collision can occur, for example in file names or some such. If the obvious approach (create a filename and check whether it exists) is too slow and you have huge gaps in the "allocation map", then generate a random ID and check with that. This should allow you to find an unused ID with just a few iterations.
使用最大线性反馈移位寄存器并存储您提供的最后一个值出去。 给定一个特定的起始点(不包括零),LFSR 将以伪随机顺序给出序列 1..2^n 中的所有数字。 如果从第 k 个元素开始,您将始终得到相同的第 k+1 个元素。 实现很小:
其中反馈是来自您想要生成的位数的最大反馈表的位模式。 这意味着要生成下一个数字,您需要读取最后发出的数字,通过上面的代码运行它,然后使用它。
或者,为什么不直接计数并存储最后给出的数字呢?
Use a Maximal Linear Feedback Shift Register and store the last value you handed out. An LFSR will, given a particular starting point (not including zero) give you all the numbers in the sequence 1..2^n, in a pseudorandom order. If you start with the kth element, you will always get the same k+1th element. The implementation is tiny:
where feedback is a bit pattern from a table of maximal feedbacks for the number of bits you want to generate. This means that to generate the next number, you read the last number handed out, run it through the above code and then use it.
Alternately, why aren't you just counting up and storing the last number given out?
我会选择 2。但是要小心你如何选择生成器和种子。 所有伪数序列在经过一定次数的迭代后都会自我重复。 所以你需要测试一下你的情况,确保它不会很快重演。
I will go with 2. Be careful however how you pick the generator and the seed. All pseudo-number sequences repeat them selves after some number of iterations. So you need to test yours that it won't repeat itself too soon.
我会尝试 3 的变体。我不会存储排序的值数组,而是存储排序的范围数组。
I would try a variant of 3. Instead of storing a sorted array of values, I would store a sorted array of ranges.
你有多少内存? 很难说清楚,“嵌入式”现在的含义如此之多。 :) 位图在生成期间需要 8192 字节,并保证每次都有完美的结果。
我还考虑过某种“稀疏”位图,但我不知道合适的数据结构,但可能值得研究。
How much RAM do you have? It's a bit hard to tell, "embedded" can mean so much these days. :) A bitmap would require 8192 bytes, for the duration of the generation, and guarantee perfect results every time.
I also considered some kind of "sparse" bitmap, but I don't know a suitable data structure off hand, but might be worth investigating.
保持一个连续的ID的运行计数。
通过MD5运行ID。
使用最低 16 位。
理论上,MD5 为每个输入创建不同的哈希值。 最低 16 位应该与整个散列一样“随机”。
Keep a running count a sequential ID.
Run the ID through MD5.
Use the lowest 16-bits.
The theory is that MD5 creates a different hash for each input. The lowest 16-bits should be just as "random" as the whole hash.
如果你可以使用更大的 id,那么 5 是理所当然的。
If you could use bigger ids then 5 would be a no-brainer.
关于您对 CRC 等算法的兴趣...
您似乎正在寻找一种算法,该算法将运行小于 64K 16 位数字的随机列表,并生成列表中尚未存在的新 16 位数字; 最好在给定列表中单次执行此操作。
如果释放 ID 的顺序与分配 ID 的顺序无关(就像我认为是您的情况),则您无法通过生成或分配 ID 来获得算法。
最佳选择似乎是您列表中的 5 个。
如果您喜欢冒险......
并且,如果您可以重新编号您的 ID(即将分配的 ID 更改为另一个未分配的 ID),您可以偶尔运行“碎片整理”类型的迭代,将所有分配的 ID 移动到降低值并找到下一次分配的最高空闲 ID 号。 这将有助于记住自上次运行“碎片整理”以来完成的分配和释放的总数。 从 0 开始按顺序分配增量。
这样您只需记住内存中的 3 个无符号短整数。 是的,根据它们的值偶尔进行一次成本稍高的重新分配迭代。
About your CRC like algorithm interest...
It appears that you are looking for an algorithm that will run through a random list of less than 64K 16-bit numbers and generate a new 16-bit number not already in the list; preferably doing this in a single pass through the given list.
If the sequence in which IDs are freed has no relation to the sequence in which they are allocated (like i feel is your case), there is nothing you can do with generation or allocation of the IDs to get your algorithm.
The best bet then seems to be 5 from your list.
If you are adventurous...
and, if you can re-number your IDs (that is change an allocated ID to another unallocated ID), you could run a 'defrag' kind of iteration once in a while to move all allocated IDs to lower values and find the highest free ID number for the next allocation. It would help to remember the total number of allocations and frees done since the last such 'defrag' run. Allocate incrementing sequentially starting from 0.
This way you need to remember just 3 unsigned short integers in memory. And yes, take a slightly costly re-allocation iteration once in a while based on their values.
另一种选择是在闪存驱动器上保留有效 ID 的文件。 这样,您就不必每次都查询所有可能性。 如果您想要随机 ID,请随机化文件。 将最后一个的偏移量存储为文件中的第一个数字。 当您需要一个时,从文件中删除最后一个,当您释放一个时,将其添加回文件中。 有了偏移量和闪存驱动器,无论剩余的 ID 数量有多少,它都应该是一个几乎恒定时间的操作。 作为奖励,如果您在任何时候需要知道的话,开头的偏移量会告诉您还剩下多少个 ID。 缺点是您必须访问每个 ID 的闪存(恒定时间,但仍然是访问),以及如果文件不存在如何处理错误情况。
Another option would be to keep a file of valid ids on the flash drive. This way, you aren't querying all the possibilities every time. If you want random IDs, then randomize the file. Store the offset to the final one as the first number in the file. When you need one, remove the last from the file, and when you free one, add it back to the file. With the offset and a flash drive, it should be a nearly constant-time operation no matter the number of IDs left. As a bonus, the offset at the beginning will tell you how many IDs you have left, if you need to know that at any point. The downside would be that you do have to access the flash memory for each ID (constant-time, but still an access), and how to handle the error case if the file is not present.