在红宝石中计算汉明距离的最有效方法？

发布于 2024-11-16 04:59:12 字数 544 浏览 4 评论 0原文

在ruby中，计算两个无符号整数之间的位差（例如汉明距离）的最有效方法是什么？

例如，我有整数 a = 2323409845 和 b = 1782647144。

它们的二进制表示形式是：

a = 10001010011111000110101110110101
b = 01101010010000010000100101101000

a 和 b 之间的位差； b 是 17..

我可以对它们进行逻辑异或，但这会给我一个不同的整数！= 17，然后我必须迭代结果的二进制表示并计算 1 的数量。

计算位差最有效的方法是什么？

现在，计算多个整数序列的位差的答案是否会改变？例如，给定 2 个无符号整数序列：

x = {2323409845,641760420,509499086....}
y = {uint,uint,uint...}

计算两个序列之间的位差的最有效方法是什么？

您会迭代序列，还是有更快的方法来一次计算整个序列的差异？

原文

In ruby, what is the most efficient way to calculate the bit difference between two unsigned integers (e.g. the hamming distance)?

Eg, I have integer a = 2323409845 and b = 1782647144.

Their binary representations are:

a = 10001010011111000110101110110101
b = 01101010010000010000100101101000

The bit difference between the a & b is 17..

I can do a logical XOR on them, but that will give me a different integer != 17, I would then have to iterate through the binary representation of the result and tally the # of 1s.

What is the most efficient way to calculate the bit difference?

Now, does the answer change for calculating the bit difference of sequences of many ints? E.g. given 2 sequences of unsigned integers:

x = {2323409845,641760420,509499086....}
y = {uint,uint,uint...}

What is the most efficient way to calculate the bit difference between the two sequences?

Would you iterate through the sequence, or is there a faster way to calculate the difference across the entire sequence at once?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

满意归宿 2024-11-23 04:59:12

您可以利用 Ruby 中优化的 String 函数来进行位计数，而不是纯粹的算术。事实证明，通过一些快速基准测试，速度提高了约 6 倍。

def h2(a, b)
  (a^b).to_s(2).count("1")
end

h1是正常的计算方式，而h2则将异或转换为字符串，统计“1”的个数

Benchmark：

ruby-1.9.2-p180:001:0>> def h1(a, b)
ruby-1.9.2-p180:002:1*> ret = 0
ruby-1.9.2-p180:003:1*> xor = a ^ b
ruby-1.9.2-p180:004:1*> until xor == 0
ruby-1.9.2-p180:005:2*> ret += 1
ruby-1.9.2-p180:006:2*> xor &= xor - 1
ruby-1.9.2-p180:007:2*> end
ruby-1.9.2-p180:008:1*> ret
ruby-1.9.2-p180:009:1*> end
# => nil
ruby-1.9.2-p180:010:0>> def h2(a, b)
ruby-1.9.2-p180:011:1*> (a^b).to_s(2).count("1")
ruby-1.9.2-p180:012:1*> end
# => nil
ruby-1.9.2-p180:013:0>> h1(2323409845, 1782647144)
# => 17
ruby-1.9.2-p180:014:0>> h2(2323409845, 1782647144)
# => 17
ruby-1.9.2-p180:015:0>> quickbench(10**5) { h1(2323409845, 1782647144) }
Rehearsal ------------------------------------
   2.060000   0.000000   2.060000 (  1.944690)
--------------------------- total: 2.060000sec

       user     system      total        real
   1.990000   0.000000   1.990000 (  1.958056)
# => nil
ruby-1.9.2-p180:016:0>> quickbench(10**5) { h2(2323409845, 1782647144) }
Rehearsal ------------------------------------
   0.340000   0.000000   0.340000 (  0.333673)
--------------------------- total: 0.340000sec

       user     system      total        real
   0.320000   0.000000   0.320000 (  0.326854)
# => nil
ruby-1.9.2-p180:017:0>>

You can make use of the optimized String functions in Ruby to do the bit counting, instead of pure arithmetic. It turns out to be about 6 times faster with some quick benchmarking.

def h2(a, b)
  (a^b).to_s(2).count("1")
end

h1 is the normal way to calculate, while h2 converts the xor into a string, and counts the number of "1"s

Benchmark:

ruby-1.9.2-p180:001:0>> def h1(a, b)
ruby-1.9.2-p180:002:1*> ret = 0
ruby-1.9.2-p180:003:1*> xor = a ^ b
ruby-1.9.2-p180:004:1*> until xor == 0
ruby-1.9.2-p180:005:2*> ret += 1
ruby-1.9.2-p180:006:2*> xor &= xor - 1
ruby-1.9.2-p180:007:2*> end
ruby-1.9.2-p180:008:1*> ret
ruby-1.9.2-p180:009:1*> end
# => nil
ruby-1.9.2-p180:010:0>> def h2(a, b)
ruby-1.9.2-p180:011:1*> (a^b).to_s(2).count("1")
ruby-1.9.2-p180:012:1*> end
# => nil
ruby-1.9.2-p180:013:0>> h1(2323409845, 1782647144)
# => 17
ruby-1.9.2-p180:014:0>> h2(2323409845, 1782647144)
# => 17
ruby-1.9.2-p180:015:0>> quickbench(10**5) { h1(2323409845, 1782647144) }
Rehearsal ------------------------------------
   2.060000   0.000000   2.060000 (  1.944690)
--------------------------- total: 2.060000sec

       user     system      total        real
   1.990000   0.000000   1.990000 (  1.958056)
# => nil
ruby-1.9.2-p180:016:0>> quickbench(10**5) { h2(2323409845, 1782647144) }
Rehearsal ------------------------------------
   0.340000   0.000000   0.340000 (  0.333673)
--------------------------- total: 0.340000sec

       user     system      total        real
   0.320000   0.000000   0.320000 (  0.326854)
# => nil
ruby-1.9.2-p180:017:0>>

回复收藏 0 原文

執念 2024-11-23 04:59:12

根据 mu 太短的建议，我编写了一个简单的 C 扩展来使用 __builtin_popcount ，并使用基准测试验证它至少比 ruby 的优化字符串函数快 3 倍。

我查看了以下两个教程：

在我的程序中：

require './FastPopcount/fastpopcount.so'
include FastPopcount

def hamming(a,b)
  popcount(a^b)
end

然后在包含我的程序的目录中，我创建一个包含以下文件的文件夹“PopCount”。

extconf.rb：

# Loads mkmf which is used to make makefiles for Ruby extensions
require 'mkmf'

# Give it a name
extension_name = 'fastpopcount'

# The destination
dir_config(extension_name)

# Do the work
create_makefile(extension_name)

popcount.c：

// Include the Ruby headers and goodies
#include "ruby.h"

// Defining a space for information and references about the module to be stored internally
VALUE FastPopcount = Qnil;

// Prototype for the initialization method - Ruby calls this, not you
void Init_fastpopcount();

// Prototype for our method 'popcount' - methods are prefixed by 'method_' here
VALUE method_popcount(int argc, VALUE *argv, VALUE self);

// The initialization method for this module
void Init_fastpopcount() {
    FastPopcount = rb_define_module("FastPopcount");
    rb_define_method(FastPopcount, "popcount", method_popcount, 1); 
}

// Our 'popcount' method.. it uses the builtin popcount
VALUE method_popcount(int argc, VALUE *argv, VALUE self) {
    return INT2NUM(__builtin_popcount(NUM2UINT(argv)));
}

然后在popcount目录中运行：

ruby extconf.rb
然后运行

程序，你就得到了......在 ruby 中进行汉明距离的最快方法。

Per the suggestion of mu is too short, I wrote a simple C extension to use __builtin_popcount , and using benchmark verified that it is at least 3X faster than ruby's optimized string functions..

I looked at the following two tutorials:

In my program:

require './FastPopcount/fastpopcount.so'
include FastPopcount

def hamming(a,b)
  popcount(a^b)
end

Then in the dir containing my program, I create a folder "PopCount" with the following files.

extconf.rb:

# Loads mkmf which is used to make makefiles for Ruby extensions
require 'mkmf'

# Give it a name
extension_name = 'fastpopcount'

# The destination
dir_config(extension_name)

# Do the work
create_makefile(extension_name)

popcount.c:

// Include the Ruby headers and goodies
#include "ruby.h"

// Defining a space for information and references about the module to be stored internally
VALUE FastPopcount = Qnil;

// Prototype for the initialization method - Ruby calls this, not you
void Init_fastpopcount();

// Prototype for our method 'popcount' - methods are prefixed by 'method_' here
VALUE method_popcount(int argc, VALUE *argv, VALUE self);

// The initialization method for this module
void Init_fastpopcount() {
    FastPopcount = rb_define_module("FastPopcount");
    rb_define_method(FastPopcount, "popcount", method_popcount, 1); 
}

// Our 'popcount' method.. it uses the builtin popcount
VALUE method_popcount(int argc, VALUE *argv, VALUE self) {
    return INT2NUM(__builtin_popcount(NUM2UINT(argv)));
}

Then in the popcount directory run:

ruby extconf.rb
make

Then run the program, and there you have it....fastest way to do hamming distance in ruby.

回复收藏 0 原文

心碎无痕… 2024-11-23 04:59:12

韦格纳的算法：

def hamm_dist(a, b)
  dist = 0
  val = a ^ b

  while not val.zero?
    dist += 1
    val &= val - 1
  end
  dist
end

p hamm_dist(2323409845, 1782647144) # => 17

An algorithm of Wegner:

def hamm_dist(a, b)
  dist = 0
  val = a ^ b

  while not val.zero?
    dist += 1
    val &= val - 1
  end
  dist
end

p hamm_dist(2323409845, 1782647144) # => 17

回复收藏 0 原文