当我们散列特定字符串或单词时,真正发生的事情(实际过程)

发布于 2025-01-05 00:49:54 字数 329 浏览 1 评论 0原文

您好,我正在尝试用 java 开发一个计数布隆过滤器。我确实搜索了有关布隆过滤器的大部分资源。我理解的是,当我们对特定字符串或单词进行散列(散列)时,散列的结果将返回一个值,以便我们可以将内容存储在该结果值中地方。 但我的大问题是如何进行哈希(算法)。当我们对特定的字符串或单词进行哈希处理时,到底会发生什么。您能否解释一下,当我们对特定字符串或单词进行哈希处理时,到底会发生什么(例如,当我们对特定字符串或单词进行哈希处理时,特定的最终值是如何到达的)。我还读到也有发生碰撞的机会。您还可以解决为什么生成的哈希值不唯一(为什么它有时会为不同的输入返回相同的哈希值)。我真的需要编写代码来进行散列吗?或者java中是否有任何内置函数来进行散列。

Hi am trying to develop a counting bloom filter in java. i really searched most of the sources about the bloom filter.. The thing i understood is when we hash (do hashing) the particular string or word, the result of hashing will return one value so that we can store the content in that resultant value place.
But my big question is how to do the hashing (the algorithm). What really happens when we hash a particular string or word. Can u please explain me what really happens when we hash a particular string or word (Like how the particular final value arrives when we do hashing on particular string or word). I also read there is also chances for collision. Can you also address, Why the resultant hash value is not unique (Why its sometimes returns same hash value for different inputs). And do i really need to write the code to do hashing or is there any inbuilt functions in java to do hashing.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

一抹微笑 2025-01-12 00:49:55

您只需在任何对象上调用 hashCode() 即可获取哈希码。特别是对于 javadoc

公共 int hashCode()

返回该字符串的哈希码。 String 对象的哈希码
计算为

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

使用int算术,其中s[i]是字符串的第i个字符,n
是字符串的长度,^表示求幂。 (哈希值
空字符串的值为零。)

You can simply get a hash code by calling hashCode() on any object. In particular for class String from javadoc:

public int hashCode()

Returns a hash code for this string. The hash code for a String object
is computed as

s[0]*31^(n-1) + s[ 1]*31^(n-2) + ... + s[n-1]

using int arithmetic, where s[i] is the ith character of the string, n
is the length of the string, and ^ indicates exponentiation. (The hash
value of the empty string is zero.)

树深时见影 2025-01-12 00:49:55

“哈希”是一个函数

H: I -> O

通常,集合 IO 更大或更复杂。在哈希表中,I 是元素的类,O 是正整数集。特别是,在布隆过滤器中,您有 n 个不同的函数。要开发哈希函数,您需要提取相似对象的不同特征。例如,对于字符串,您可以具有:

  • 长度
  • 第一个字符的
  • 特定字符出现的次数
  • 作为多项式计算的字符串 h(S) = sum (s(i)*31^i) mod d

当使用多个哈希特性时,应避免冲突,例如使用 number of voyelsnumber of non-voyels 并没有多大帮助。哈希函数必须具备一些特征,请查看维基百科条目

"Hashing" is a function

H: I -> O

Where usually the set I is much bigger or more complex than O. In hash table I is the class of your elements is, and O is the set of positive integers. Particularly, in a bloom filter you have n different functions. To develop a hash function you need to extract different characteristics of similar objects. For example, for character strings you can have :

  • the length
  • the first character
  • the number of occurrences of a specific character
  • the string evaluated as a polynomial h(S) = sum (s(i)*31^i) mod d

When using multiple hash collision of characteristics should be avoided, for example using number of voyels and number of non-voyels is not really helpful. There are some characteristics to a hash function must have, look at the wikipedia entry

囍孤女 2025-01-12 00:49:55

String 执行的代码如下:

public int hashCode() {
int h = hash;
    int len = count;
if (h == 0 && len > 0) {
    int off = offset;
    char val[] = value;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}

哈希是一个函数(不是双射),因此不同的输入可以产生相同的结果。这是哈希函数的基础知识

The code executed for String is this one:

public int hashCode() {
int h = hash;
    int len = count;
if (h == 0 && len > 0) {
    int off = offset;
    char val[] = value;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}

Hash is a function (not a bijection) and therefore, different inputs can produce the same result. This is the basics of hash functions

半透明的墙 2025-01-12 00:49:55

Java 允许您重写类的 hashCode() 方法以使用哈希算法

public class Employee {


   // Default implementation might want to use "name" for as part of hashCode
   private String name; 

   @Override
   public int hashCode() {
     // We know that ID is always unique, so don't use name in calculating 
     // the hash code. & hashCode() is an int
     return id;
   }
}

*(如果您要重写 hashCode,您还应该重写 equals。)

哈希码是根据存储在集合中的每个对象计算的。
它是使用标准算法计算的。
您确实可以在每个对象的基础上重写 hashcode 方法。
实现 hashcode 方法的一种方法是使用 HashcodeBuilder。

希望这有帮助。在stackoverflow中搜索更多与本文相关的内容,可以获得更多描述性的答案。

Java allows you to override the hashCode() method for your Classes to use a hashing algorithm

public class Employee {


   // Default implementation might want to use "name" for as part of hashCode
   private String name; 

   @Override
   public int hashCode() {
     // We know that ID is always unique, so don't use name in calculating 
     // the hash code. & hashCode() is an int
     return id;
   }
}

*(if you are going to override hashCode you should also override equals.)

The hashcode is computed per object stored in the collection.
It is computed using a standard algorithm.
You can indeed override the hashcode method on a per object basis.
one way to implement a hashcode method is using HashcodeBuilder.

Hope this helps. Search more in stack overflow related to this article ,you can get more descriptive answers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文