如何在 Perl 中创建唯一标识符?

发布于 2024-07-25 23:52:41 字数 626 浏览 5 评论 0原文

我正在创建一个面向文件的数据库,其中包含不同用户执行的一些测试结果。 为此,我需要为数据库中的每个条目生成唯一的 ID。 id 必须满足以下要求:

  • Id 应该相当小(最多 6 个字符)
  • 对于每个测试用例和用户组合,每次都应该生成相同的 id

我尝试的是一个简单的 BKDR 哈希函数,种子值为 31 并使用 ord()函数如下:

@chars = split(//,$hash_var);

$hash = 0;
$seed = 31;

foreach $char ( @chars ) {
   if( $char !~ m/\d/ ) {
       $hash = ( $seed * $hash ) + ord( $char );
   }  
   else {
       $hash = ( $seed * $hash ) + $char ;
   }
}

$hash = ( $hash & 0x7FFFFFFF ) % 1000;
$hash = "$chars[0]$chars[$#chars]$hash" ;

这有时会导致各种组合产生相同的结果,即未观察到唯一性。 他们还有其他方法来实现这一目标吗? 改变种子价值是否有助于实现独特性?

I am creating a file-oriented database of some test results performed by various users. For this I need to generate unique id for every entry in the database. The ids must satisfy following requirements:

  • Ids should be fairly small (6 characters at most)
  • For every test case and user combination each time same id should be generated

What I tried was a simple BKDR hash function with seed value 31 and used ord() function as follows:

@chars = split(//,$hash_var);

$hash = 0;
$seed = 31;

foreach $char ( @chars ) {
   if( $char !~ m/\d/ ) {
       $hash = ( $seed * $hash ) + ord( $char );
   }  
   else {
       $hash = ( $seed * $hash ) + $char ;
   }
}

$hash = ( $hash & 0x7FFFFFFF ) % 1000;
$hash = "$chars[0]$chars[$#chars]$hash" ;

This sometimes leads to same results for various combinations i.e uniqueness is not observed. Is their any other way to accomplish this? Does changing seed value help accomplish uniqueness.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

去了角落 2024-08-01 23:52:41

您是否拥有超过 256 个用户和/或每个用户超过 65536 个测试用例? 如果没有,您可以从 0 .. 255 索引用户并从 0 .. 65535 索引测试用例,并将其编码为十六进制数字字符串,因此六个字符就可以了。

如果您有更多的用户或测试用例,我会再次索引用户和测试用例,然后将它们组合成一个 32 位整数,这实际上只需要 4 个字节,并且实现起来很简单,但对人类来说稍微困难一些。

无论如何,我假设您已获得用户名和测试用例信息。 只需保留两个绑定的哈希值:%users%cases 将用户和测试用例映射到其索引号。

Do you have more than 256 users and/or more than 65536 test cases per user? If not, you can just index users from 0 .. 255 and test cases from 0 .. 65535 and encode it as a string of hexadecimal digits so six characters would be fine.

If you have more users or test cases than that, I would again index the users and test cases and then combine them into a 32-bit integer which would actually only take 4 bytes and be trivial to implement but slightly harder on humans.

In any case, I am assuming you are given user name and test case information. Just keep two tied hashes: %users and %cases to map users and test cases to their index numbers.

北渚 2024-08-01 23:52:41

您的部分问题可能是您正在使用浮点数学,而 BKDR 几乎肯定需要整数数学。 您可以通过说

my @chars = split(//,$hash_var);

my $hash = 0;
my $seed = 31;

for my $char ( @chars ) {
   use integer;
   if( $char !~ m/\d/ ) {
       $hash = ( $seed * $hash ) + ord( $char );
   }  
   else {
       $hash = ( $seed * $hash ) + $char ;
   }
}

$hash = ( $hash & 0x7FFFFFFF ) % 1000;
$hash = "$chars[0]$chars[$#chars]$hash" ;

另一个可能有帮助的调整是使用第一个和最后一个字符以外的字符来修复该错误。 如果第一个和最后一个字符往往相同,则它们不会为哈希添加唯一性。

您可能还想使用更好的哈希函数,例如 MD5(在 Digest::MD5 中提供)并将结果修剪为您想要的大小。 然而,您使用哈希的事实意味着您面临发生冲突的风险。

Part of your problem may be that you are using floating point math and BKDR is almost certainly wanting integer math. You can fix that bug by saying

my @chars = split(//,$hash_var);

my $hash = 0;
my $seed = 31;

for my $char ( @chars ) {
   use integer;
   if( $char !~ m/\d/ ) {
       $hash = ( $seed * $hash ) + ord( $char );
   }  
   else {
       $hash = ( $seed * $hash ) + $char ;
   }
}

$hash = ( $hash & 0x7FFFFFFF ) % 1000;
$hash = "$chars[0]$chars[$#chars]$hash" ;

Another tweak that might help is using characters other than the first and last. If the first and last characters tend to be the same, they add no uniqueness to the hash.

You may also want to use a better hash function like MD5 (available in Digest::MD5) and trim the result to your desired size. However, the fact that you are using a hash at all means that you run the risk of having a collision.

み零 2024-08-01 23:52:41

如果您没有很多用户/测试用例,像这样的简单解决方案可能就足够了。 您必须添加限制(并且可能在存储时打包整数)。

vinko@parrot:~# more hash.pl
use strict;
use warnings;

my %hash;
my $count = 0;

sub getUniqueId {

        my $_user = shift;
        my $_test = shift;
        my $val;

        my $key = $_user."|".$_test;
        if (defined $hash{$key}) {
                $val = $hash{$key};
        } else {
                $hash{$key} = $count;
                $val = $count;
                $count = $count + 1;
        }
        return $val;
}

my @users = qw{ user1 user2 user3 user4 user5 user3 user5 };
my @testcases = qw{ test1 test2 test3 test1 test1 };

for my $user (@users) {
        for my $test (@testcases) {
                print "$user $test: ".getUniqueId($user,$test)."\n";
        }
}
vinko@parrot:~# perl hash.pl
user1 test1: 0
user1 test2: 1
user1 test3: 2
user1 test1: 0
user1 test1: 0
user2 test1: 3
user2 test2: 4
user2 test3: 5
user2 test1: 3
user2 test1: 3
user3 test1: 6
user3 test2: 7
user3 test3: 8
user3 test1: 6
user3 test1: 6
user4 test1: 9
user4 test2: 10
user4 test3: 11
user4 test1: 9
user4 test1: 9
user5 test1: 12
user5 test2: 13
user5 test3: 14
user5 test1: 12
user5 test1: 12
user3 test1: 6
user3 test2: 7
user3 test3: 8
user3 test1: 6
user3 test1: 6
user5 test1: 12
user5 test2: 13
user5 test3: 14
user5 test1: 12
user5 test1: 12

If you don't have a lot of users/testcases a simple solution like this might be enough. You'd have to add the limit (and probably pack the integer when storing it).

vinko@parrot:~# more hash.pl
use strict;
use warnings;

my %hash;
my $count = 0;

sub getUniqueId {

        my $_user = shift;
        my $_test = shift;
        my $val;

        my $key = $_user."|".$_test;
        if (defined $hash{$key}) {
                $val = $hash{$key};
        } else {
                $hash{$key} = $count;
                $val = $count;
                $count = $count + 1;
        }
        return $val;
}

my @users = qw{ user1 user2 user3 user4 user5 user3 user5 };
my @testcases = qw{ test1 test2 test3 test1 test1 };

for my $user (@users) {
        for my $test (@testcases) {
                print "$user $test: ".getUniqueId($user,$test)."\n";
        }
}
vinko@parrot:~# perl hash.pl
user1 test1: 0
user1 test2: 1
user1 test3: 2
user1 test1: 0
user1 test1: 0
user2 test1: 3
user2 test2: 4
user2 test3: 5
user2 test1: 3
user2 test1: 3
user3 test1: 6
user3 test2: 7
user3 test3: 8
user3 test1: 6
user3 test1: 6
user4 test1: 9
user4 test2: 10
user4 test3: 11
user4 test1: 9
user4 test1: 9
user5 test1: 12
user5 test2: 13
user5 test3: 14
user5 test1: 12
user5 test1: 12
user3 test1: 6
user3 test2: 7
user3 test3: 8
user3 test1: 6
user3 test1: 6
user5 test1: 12
user5 test2: 13
user5 test3: 14
user5 test1: 12
user5 test1: 12
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文