布隆过滤器什么时候有用?
我理解是什么让布隆过滤器成为一种有吸引力的数据结构;但是,我发现很难真正理解何时可以使用它们,因为您仍然必须执行您试图避免的昂贵操作,以确保您没有发现误报。因此,他们通常不会增加很多开销吗?例如,布隆过滤器的维基百科文章表明它们可以用于数据同步。我知道第一次当布隆过滤器为空时会很棒,但假设您没有更改任何内容,然后再次同步数据。现在,对布隆过滤器的每次查找都会报告文件已被复制,但是我们是否仍然需要执行我们试图避免的较慢的查找任务,以实际确保这是正确的?
I understand what makes bloom filters an attractive data structure; however, I'm finding it difficult to really understand when you can use them since you still have to perform the expensive operation you're trying to avoid to be certain that you haven't found a false positive. Because of this wouldn't they generally just add a lot of overhead? For example the wikipedia article for bloom filters suggests they can be used for data synchronization. I see how it would be great the first time around when the bloom filter is empty but say you haven't changed anything and you go to synchronize your data again. Now every lookup to the bloom filter will report that the file has already been copied but wouldn't we still have to preform the slower lookup task we were trying to avoid to actually make sure that's correct?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
基本上,您使用布隆过滤器来避免证明数据结构中不存在某项的漫长而艰巨的任务。确定某些东西是否丢失几乎总是比确定它是否存在更困难,因此过滤器有助于减少搜索你无论如何也找不到的东西时的损失。它并不总是有效,但当它有效时,你会获得巨大的好处。
Basically, you use Bloom filters to avoid the long and arduous task of proving an item doesn't exist in the data structure. It's almost always harder to determine if something is missing than if it exists, so the filter helps to shore up losses searching for things you won't find anyway. It doesn't always work, but when it does you reap a huge benefit.
布隆过滤器在成员资格查询的情况下非常有效,即找出一个元素是否属于该集合。集合中元素的数量不会影响查询性能。
Bloom filters are very efficient in the case of membership queries, i.e., to find out whether an element belongs to the set. The number of elements in the set does not affect the query performance.
一个常见的示例是将电子邮件地址添加到电子邮件列表中。您的应用程序应检查它是否已在联系人列表中,如果不在,则会出现一个弹出窗口,询问您是否要添加新收件人。要实现此目的,您通常在前端应用程序中遵循以下步骤:
布隆过滤器将以快速且节省内存的方式处理所有这些步骤。您可以使用字典进行快速查找,但这需要将整个联系人列表保存在密钥对中。对于如此大的联系人列表,您的浏览器中可能没有足够的存储空间
A common example is when you add an email address to the email list. your application should check whether it’s already in the contacts list, and if it’s not, a popup should appear asking you if you want to add the new recipient. To Implement this, you normally follow those steps in front end application:
Bloom filter will handle all those steps fast and memory-efficient way. You could use a dictionary for a fast lookup but that would require to save the entire contact list in key-pair. For such a large contact-list you might not have enough storage space in browser