当前位置：文江博客话题详情

如何在C++中快速比较变长位串？

发布于 2024-10-14 01:03:40 字数 302 浏览 3 评论 0 原文

我正在根据一组特征的二进制存在或不存在来执行对象比较。这些特征可以用比特串来表示，例如：

该比特串具有第一、第四和第五特征。

我试图计算一对位串的相似度作为两者共有的特征数量。对于给定的一组位字符串，我知道它们都具有相同的长度，但我不知道在编译时该长度是多少。

例如，这两个字符串有两个共同特征，因此我希望相似度函数返回 2：

s(10011,10010) = 2

如何在 C++ 中有效地表示和比较位字符串？

原文

I'm performing comparisons of objects based on the binary presence or absence of a set of features. These features can be represented by a bit string, such as this:

This bitstring has the first, fourth and fifth feature.

I'm trying to calculate the similarity of a pair of bit strings as the number of features that both share in common. For a given set of bit strings I know that they'll all have the same length, but I don't know at compile time what that length will be.

For example, these two strings have two features in common, so I'd like the similarity function to return 2:

s(10011,10010) = 2

How do I efficiently represent and compare bit-strings in C++?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半葬歌 2024-10-21 01:03:40

您可以使用 std::bitset STL 类。

它们可以从位串构建，进行与运算，并计算 1 的数量：

#include <string>
#include <bitset>

int main()
{
  std::bitset<5> option1(std::string("10011")), option2(std::string("10010"));
  std::bitset<5> and_bit = option1 & option2; //bitset will have 1s only on common options
  size_t s = and_bit.count ();                //return the number of 1 in the bitfield
  return 0;
}

编辑

如果在编译时位数未知，您可以使用 boost::dynamic_bitset：

boost::dynamic_bitset<> option(bit_string);

示例的其他部分不变，因为 boost::dynamic_bitset 与 std::bitset 共享一个公共接口。

You can use the std::bitset STL class.

They can be built from bit strings, ANDed, and count the number of 1:

#include <string>
#include <bitset>

int main()
{
  std::bitset<5> option1(std::string("10011")), option2(std::string("10010"));
  std::bitset<5> and_bit = option1 & option2; //bitset will have 1s only on common options
  size_t s = and_bit.count ();                //return the number of 1 in the bitfield
  return 0;
}

EDIT

If number of bits is unknown at compile time, you can use boost::dynamic_bitset<>:

boost::dynamic_bitset<> option(bit_string);

Other parts of example don't change, since boost::dynamic_bitset<> share a common interface with std::bitset.

回复收藏 0 原文

纸短情长 2024-10-21 01:03:40

由于您在编译时不知道位长度，因此可以使用 boost::dynamic_bitset 而不是 std::bitset。

然后，您可以使用 operator&（或 &=）查找公共位，并使用 boost::dynamic_bitset::count() 对它们进行计数代码>.

性能取决于。为了获得最大速度，根据您的编译器，您可能必须自己实现循环，例如使用 @Nawaz 的方法，或来自 Bit Twiddling Hacks，或者使用 sse/popcount/etc 的汇编器/编译器内在函数编写循环。

请注意，至少 llvm、gcc 和 icc 会检测到许多此类模式并为您进行优化，因此在进行手动工作之前分析/检查生成的代码。

回复收藏 0 原文

所谓喜欢 2024-10-21 01:03:40

更快的算法：

int similarity(unsigned int a, unsigned int b)
{
   unsigned int r = a & b;
   r = ( r & 0x55555555 ) + ((r >> 1) & 0x55555555 );
   r = ( r & 0x33333333 ) + ((r >> 2) & 0x33333333 );
   r = ( r & 0x0f0f0f0f ) + ((r >> 4) & 0x0f0f0f0f );
   r = ( r & 0x00ff00ff ) + ((r >> 8) & 0x00ff00ff );
   r = ( r & 0x0000ffff ) + ((r >>16) & 0x0000ffff );
   return r;
}

int main() {
        unsigned int a = 19 ;//10011
        unsigned int b = 18 ;//10010
        cout << similarity(a,b) << endl; 
        return 0;
}

输出：

ideone 上的演示：http://www.ideone.com/bE4qb

Faster algorithm:

int similarity(unsigned int a, unsigned int b)
{
   unsigned int r = a & b;
   r = ( r & 0x55555555 ) + ((r >> 1) & 0x55555555 );
   r = ( r & 0x33333333 ) + ((r >> 2) & 0x33333333 );
   r = ( r & 0x0f0f0f0f ) + ((r >> 4) & 0x0f0f0f0f );
   r = ( r & 0x00ff00ff ) + ((r >> 8) & 0x00ff00ff );
   r = ( r & 0x0000ffff ) + ((r >>16) & 0x0000ffff );
   return r;
}

int main() {
        unsigned int a = 19 ;//10011
        unsigned int b = 18 ;//10010
        cout << similarity(a,b) << endl; 
        return 0;
}

Output:

Demonstration at ideone : http://www.ideone.com/bE4qb

回复收藏 0 原文

陈独秀 2024-10-21 01:03:40

使用std::bitset，如果你的特征集小于long中的位数（我认为它是long），你可以获得位的unsigned long表示，然后< em>和两个值，并使用此处来计数。

如果您想继续使用字符串来表示位模式，您可以使用 boost 中的 zip_iterator 执行类似以下操作。

#include <iostream>
#include <string>
#include <algorithm>

#include <boost/tuple/tuple.hpp>
#include <boost/iterator/zip_iterator.hpp>

struct check_is_set :
  public std::unary_function<const boost::tuple<char const&, char const&>&, bool>
{
  bool operator()(const boost::tuple<char const&, char const&>& t) const
  {
    const char& cv1 = boost::get<0>(t);
    const char& cv2 = boost::get<1>(t);
    return cv1 == char('1') && cv1 == cv2;
  }
};

size_t count_same(std::string const& opt1, std::string const& opt2)
{
  std::string::const_iterator beg1 = opt1.begin();
  std::string::const_iterator beg2 = opt2.begin();

  // need the same number of items for end (this really is daft, you get a runtime
  // error if the sizes are different otherwise!! I think it's a bug in the
  // zip_iterator implementation...)
  size_t end_s = std::min(opt1.size(), opt2.size());
  std::string::const_iterator end1 = opt1.begin() + end_s;
  std::string::const_iterator end2 = opt2.begin() + end_s;

  return std::count_if(
  boost::make_zip_iterator(
    boost::make_tuple(beg1, beg2)
    ),
  boost::make_zip_iterator(
    boost::make_tuple(end1, end2)
    ),
    check_is_set()
  );
}

int main(void)
{
  std::string opt1("1010111");
  std::string opt2("001101");

  std::cout << "same: " << count_same(opt1, opt2) << std::endl;

  return 0;
}

Use a std::bitset, if your set of features is less than the number of bits in a long (I think it's a long), you can get an unsigned long representation of the bits, then and the two values, and use bit twidling tricks from here to count.

If you want to continue to use strings to represent your bit pattern, you could do something like the following, using the zip_iterator from boost.

#include <iostream>
#include <string>
#include <algorithm>

#include <boost/tuple/tuple.hpp>
#include <boost/iterator/zip_iterator.hpp>

struct check_is_set :
  public std::unary_function<const boost::tuple<char const&, char const&>&, bool>
{
  bool operator()(const boost::tuple<char const&, char const&>& t) const
  {
    const char& cv1 = boost::get<0>(t);
    const char& cv2 = boost::get<1>(t);
    return cv1 == char('1') && cv1 == cv2;
  }
};

size_t count_same(std::string const& opt1, std::string const& opt2)
{
  std::string::const_iterator beg1 = opt1.begin();
  std::string::const_iterator beg2 = opt2.begin();

  // need the same number of items for end (this really is daft, you get a runtime
  // error if the sizes are different otherwise!! I think it's a bug in the
  // zip_iterator implementation...)
  size_t end_s = std::min(opt1.size(), opt2.size());
  std::string::const_iterator end1 = opt1.begin() + end_s;
  std::string::const_iterator end2 = opt2.begin() + end_s;

  return std::count_if(
  boost::make_zip_iterator(
    boost::make_tuple(beg1, beg2)
    ),
  boost::make_zip_iterator(
    boost::make_tuple(end1, end2)
    ),
    check_is_set()
  );
}

int main(void)
{
  std::string opt1("1010111");
  std::string opt2("001101");

  std::cout << "same: " << count_same(opt1, opt2) << std::endl;

  return 0;
}

回复收藏 0 原文

~没有更多了~

关于作者

太傻旳人生

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

如何在C++中快速比较变长位串？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如何在C++中快速比较变长位串？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。