如何生成字符串的长哈希值?

发布于 2025-01-06 07:55:21 字数 196 浏览 2 评论 0原文

我有一个java应用程序,我想在其中生成字符串的long id(以便将这些字符串存储在中Neo4j)。为了避免数据重复,我想为存储在long整数中的每个字符串生成一个id,该id对于每个字符串应该是唯一的。我怎样才能做到这一点?

I have a java applciation in which I want to generate long ids for strings (in order to store those strings in neo4j). In order to avoid data duplication, I would like to generate an id for each string stored in a long integer, which should be unique for each string. How can I do that ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

月下伊人醉 2025-01-13 07:55:21

这段代码将计算出相当好的哈希值:

String s = "some string";
long hash = UUID.nameUUIDFromBytes(s.getBytes()).getMostSignificantBits();

This code will calculate pretty good hash:

String s = "some string";
long hash = UUID.nameUUIDFromBytes(s.getBytes()).getMostSignificantBits();
橘亓 2025-01-13 07:55:21

为什么不看看 String 的 hashcode() 函数,而直接采用它来使用 long 值呢?

顺便提一句。如果有办法为每个字符串创建唯一的 ID,那么您就会找到一种压缩算法,能够将每个字符串打包为 8 个字节(根据定义不可能)。

Why don't you have a look a the hashcode() function of String, and just adopt it to using long values instead?

Btw. if there was a way to create a unique ID for each String, then you would have found a compression algorithm that would be able to pack every String into 8 bytes (not possible by definition).

淡紫姑娘! 2025-01-13 07:55:21

long 有 64 位。长度为 9 的String 有 72 位。来自 鸽子洞原理 - 您无法将 9 个字符长的字符串获取到 的唯一哈希值长

如果您仍然想要一个long哈希值:您可以为String->int采用两个标准[不同!]哈希函数,hash1()hash2() 并计算:hash(s) = 2^32* hash1(s) + hash2(s)

long has 64 bits. A String of length 9 has 72 bits. from pigeon hole principle - you cannot get a unique hashing for 9 chars long strings to a long.

If you still want a long hash: You can just take two standard [different!] hash functions for String->int, hash1() and hash2() and calculate: hash(s) = 2^32* hash1(s) + hash2(s)

冰雪之触 2025-01-13 07:55:21

简单的 64 位哈希可以通过将 CRC32 与 Adler32 组合来实现,尽管它们不是为哈希而设计的。当然,这种组合不如现代哈希技术那么强大,但对于本身提供 CRC 库的语言来说,它是可移植的。

Java 中的示例:

package com.example;

import java.util.zip.Adler32;
import java.util.zip.CRC32;

public class MySimpleHash {

    /**
     * Calculate a 64 bits hash by combining CRC32 with Adler32.
     * 
     * @param bytes a byte array
     * @return a hash number
     */
    public static long getHash(byte[] bytes) {

        CRC32 crc32 = new CRC32();
        Adler32 adl32 = new Adler32();

        crc32.update(bytes);
        adl32.update(bytes);

        long crc = crc32.getValue();
        long adl = adl32.getValue();

        return (crc << 32) | adl;
    }

    public static void main(String[] args) {
        String string = "This is a test string";
        long hash = getHash(string.getBytes());
        System.out.println("output: " + hash);
    }
}
output: 7732385261082445741

Python 中的示例:

#!/usr/bin/python3

import zlib

def get_hash(bytes):
    return zlib.crc32(bytes) << 32 | zlib.adler32(bytes)

string = "This is a test string"
hash = get_hash(string.encode())
print("output:", hash)
output: 7732385261082445741

此要点比较了一些哈希方法:
https://gist.github.com/fabiolimace/507eac3d35900050eeb9772e5b1871ba

A simple 64 bits hash can be implemented by combining CRC32 with Adler32, although they are not made for hashing. Of course the combination is not as strong as modern hash techniques, but it is portable for languages that natively provide a library for CRC.

Example in Java:

package com.example;

import java.util.zip.Adler32;
import java.util.zip.CRC32;

public class MySimpleHash {

    /**
     * Calculate a 64 bits hash by combining CRC32 with Adler32.
     * 
     * @param bytes a byte array
     * @return a hash number
     */
    public static long getHash(byte[] bytes) {

        CRC32 crc32 = new CRC32();
        Adler32 adl32 = new Adler32();

        crc32.update(bytes);
        adl32.update(bytes);

        long crc = crc32.getValue();
        long adl = adl32.getValue();

        return (crc << 32) | adl;
    }

    public static void main(String[] args) {
        String string = "This is a test string";
        long hash = getHash(string.getBytes());
        System.out.println("output: " + hash);
    }
}
output: 7732385261082445741

Example in Python:

#!/usr/bin/python3

import zlib

def get_hash(bytes):
    return zlib.crc32(bytes) << 32 | zlib.adler32(bytes)

string = "This is a test string"
hash = get_hash(string.encode())
print("output:", hash)
output: 7732385261082445741

This Gist compares some hash methods:
https://gist.github.com/fabiolimace/507eac3d35900050eeb9772e5b1871ba

別甾虛僞 2025-01-13 07:55:21

有很多答案,请尝试以下方法:

或者,按照之前的建议,查看来源。

附言。另一种技术是维护字符串字典:由于您不太可能很快获得 264 个字符串,因此您可以拥有完美的映射。但请注意,该映射也可能成为主要瓶颈。

There are many answers, try the following:

Or, as suggested before, check out the sources.

PS. One more technique is to maintain a dictionary of strings: since you're unlikely to get 264 strings any time soon, you can have perfect mapping. Note though that that mapping may as well become a major bottleneck.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文