将整数集转换为范围

发布于 2024-08-22 12:31:08 字数 814 浏览 3 评论 0 原文

将一组整数转换为一组范围的最惯用的方法是什么?

例如,给定集合 {0, 1, 2, 3, 4, 7, 8, 9, 11} 我想得到 { {0,4}, {7,9}, {11,11} }。

假设我们正在从 std::set 转换为 std::vector>。 我将范围视为包含双方,因为它在我的情况下更方便,但如果需要,我也可以使用开放式范围。

我已经编写了以下函数,但我想重新发明轮子。 请告诉我 STL 或 boost 中可能有一些东西可以解决这个问题。

typedef std::pair<int, int> Range;

void setToRanges(const std::set<int>& indices, std::vector<Range>& ranges)
{
    Range r = std::make_pair(-INT_MAX, -INT_MAX);

    BOOST_FOREACH(int i, indices)
    {
           if (i != r.second + 1)
           {
            if (r.second >= 0) ranges.push_back(r);
            r.first = i;                    
           }

           r.second = i;
    }

    ranges.push_back(r);
}

What's the most idiomatic way to convert a set of integers into a set of ranges?

E.g. given the set {0, 1, 2, 3, 4, 7, 8, 9, 11} I want to get { {0,4}, {7,9}, {11,11} }.

Let's say we are converting from std::set<int> into std::vector<std::pair<int, int>>.
I treat Ranges as inclusive on both sides, since it's more convenient in my case, but I can work with open-ended ranges too if necessary.

I've written the following function, but I feel like reinventing the wheel.
Please tell maybe there's something in STL or boost for this.

typedef std::pair<int, int> Range;

void setToRanges(const std::set<int>& indices, std::vector<Range>& ranges)
{
    Range r = std::make_pair(-INT_MAX, -INT_MAX);

    BOOST_FOREACH(int i, indices)
    {
           if (i != r.second + 1)
           {
            if (r.second >= 0) ranges.push_back(r);
            r.first = i;                    
           }

           r.second = i;
    }

    ranges.push_back(r);
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

若言繁花未落 2024-08-29 12:31:08

现在可以使用 Boost.ICL 中的interval_set (Boost > 1.46)

#include <set>
#include <iostream>
#include <algorithm>

#include <boost/icl/discrete_interval.hpp>
#include <boost/icl/closed_interval.hpp>
#include <boost/icl/interval_set.hpp>

typedef std::set<int> Set;
typedef boost::icl::interval_set<int> IntervalSet;

void setToInterval(const Set& indices, IntervalSet& intervals)
{
    Set::const_iterator pos;
    for(pos = indices.begin(); pos != indices.end(); ++pos)
    {
        intervals.insert(boost::icl::construct<boost::icl::discrete_interval<int> >(*pos, *pos, boost::icl::interval_bounds::closed()));
    }
}

int main()
{
    std::cout << ">>Interval Container Library Rocks! <<\n";
    std::cout << "----------------------------------------------------\n";

    Set indices = {0, 1, 2, 3, 4, 7, 8, 9, 11};
    IntervalSet intervals;

    setToInterval(indices, intervals);

    std::cout << "  intervals joined:    " << intervals  << "\n";

    return 0;
}

输出:

  intervals joined:    {[0,4][7,9][11,11]}

Now one can use interval_set from Boost.ICL (Boost > 1.46)

#include <set>
#include <iostream>
#include <algorithm>

#include <boost/icl/discrete_interval.hpp>
#include <boost/icl/closed_interval.hpp>
#include <boost/icl/interval_set.hpp>

typedef std::set<int> Set;
typedef boost::icl::interval_set<int> IntervalSet;

void setToInterval(const Set& indices, IntervalSet& intervals)
{
    Set::const_iterator pos;
    for(pos = indices.begin(); pos != indices.end(); ++pos)
    {
        intervals.insert(boost::icl::construct<boost::icl::discrete_interval<int> >(*pos, *pos, boost::icl::interval_bounds::closed()));
    }
}

int main()
{
    std::cout << ">>Interval Container Library Rocks! <<\n";
    std::cout << "----------------------------------------------------\n";

    Set indices = {0, 1, 2, 3, 4, 7, 8, 9, 11};
    IntervalSet intervals;

    setToInterval(indices, intervals);

    std::cout << "  intervals joined:    " << intervals  << "\n";

    return 0;
}

Output:

  intervals joined:    {[0,4][7,9][11,11]}
苹果你个爱泡泡 2024-08-29 12:31:08

我认为 STL 或 Boost 中没有任何东西可以做到这一点。

您可以做的一件事是使您的算法更加通用:

template<class InputIterator, class OutputIterator>
void setToRanges(InputIterator first, InputIterator last, OutputIterator dest)
{
    typedef std::iterator_traits<InputIterator>::value_type item_type;
    typedef typename std::pair<item_type, item_type> pair_type;
    pair_type r(-std::numeric_limits<item_type>::max(), 
                -std::numeric_limits<item_type>::max());

    for(; first != last; ++first)
    {
        item_type i = *first;
        if (i != r.second + 1)
        {
            if (r.second >= 0) 
                *dest = r;
            r.first = i;                    
        }
        r.second = i;
    }
    *dest = r;
}

用法:

std::set<int> set;
// insert items

typedef std::pair<int, int> Range;
std::vector<Range> ranges;

setToRanges(set.begin(), set.end(), std::back_inserter(ranges));

您还应该考虑使用术语 interval 而不是 range,因为后者在 STL 术语中表示“可以通过迭代器或指针访问的任何对象序列”()。

最后,您可能应该看看 Boost 区间算术库,目前正在审核是否包含 Boost。

I don't think there's anything in the STL or Boost that does this.

One thing you can do is to make your algorithm a little bit more general:

template<class InputIterator, class OutputIterator>
void setToRanges(InputIterator first, InputIterator last, OutputIterator dest)
{
    typedef std::iterator_traits<InputIterator>::value_type item_type;
    typedef typename std::pair<item_type, item_type> pair_type;
    pair_type r(-std::numeric_limits<item_type>::max(), 
                -std::numeric_limits<item_type>::max());

    for(; first != last; ++first)
    {
        item_type i = *first;
        if (i != r.second + 1)
        {
            if (r.second >= 0) 
                *dest = r;
            r.first = i;                    
        }
        r.second = i;
    }
    *dest = r;
}

Usage:

std::set<int> set;
// insert items

typedef std::pair<int, int> Range;
std::vector<Range> ranges;

setToRanges(set.begin(), set.end(), std::back_inserter(ranges));

You should also consider using the term interval instead of range, because the latter in STL parlance means "any sequence of objects that can be accessed through iterators or pointers" (source).

Finally, you should probably take at look at the Boost Interval Arithmetic Library, which is currently under review for Boost inclusion.

随风而去 2024-08-29 12:31:08

恐怕没有收缩解决方案,但有替代算法。

将项目存储在位向量中 - 如果您知道开始时的最大项目并预分配向量,则时间复杂度为 O(n)。

将该向量转换为转换点标志向量 - 异或具有其自身位移版本的位向量。在单词边界上有点麻烦,但仍然是 O(n)。从逻辑上讲,您会在旧的 max + 1 处获得一个新密钥(在所有密钥耗尽后转换回零),因此在向量的预分配中考虑到这一点是个好主意。

然后,迭代位向量查找设置的位。第一个设置位指示范围的开始,第二个设置位指示结束,第三个设置位指示下一个范围的开始,依此类推。以下位调整函数(假设 32 位 int)可能有用......

int Low_Bit_No (unsigned int p)
{
  if (p == 0)  return -1;  //  No bits set

  int           l_Result = 31;
  unsigned int  l_Range  = 0xffffffff;
  unsigned int  l_Mask   = 0x0000ffff;

  if (p & l_Mask)  {  l_Result -= 16;  }  else  {  l_Mask ^= l_Range;  }
  l_Range &= l_Mask;
  l_Mask  &= 0x00ff00ff;
  if (p & l_Mask)  {  l_Result -=  8;  }  else  {  l_Mask ^= l_Range;  }
  l_Range &= l_Mask;
  l_Mask  &= 0x0f0f0f0f;
  if (p & l_Mask)  {  l_Result -=  4;  }  else  {  l_Mask ^= l_Range;  }
  l_Range &= l_Mask;
  l_Mask  &= 0x33333333;
  if (p & l_Mask)  {  l_Result -=  2;  }  else  {  l_Mask ^= l_Range;  }
  l_Mask  &= 0x55555555;
  if (p & l_Mask)  {  l_Result -=  1;  }

  return l_Result;
}

No shrinkwrapped solution I'm afraid, but an alternative algorithm.

Store your items in a bitvector - O(n) if you know the maximum item at the start and preallocate the vector.

Translate that vector into a vector of transition point flags - exclusive-or the bitvector with a bitshifted version of itself. Slightly fiddly at the word boundaries, but still O(n). Logically, you get a new key at the old max + 1 (the transition back to zeros after all your keys are exhausted), so it's a good idea to allow for that in the preallocation of the vector.

Then, iterate through the bitvector finding the set bits. The first set bit indicates the start of a range, the second the end, the third the start of the next range and so on. The following bit-fiddling function (assuming 32 bit int) may be useful...

int Low_Bit_No (unsigned int p)
{
  if (p == 0)  return -1;  //  No bits set

  int           l_Result = 31;
  unsigned int  l_Range  = 0xffffffff;
  unsigned int  l_Mask   = 0x0000ffff;

  if (p & l_Mask)  {  l_Result -= 16;  }  else  {  l_Mask ^= l_Range;  }
  l_Range &= l_Mask;
  l_Mask  &= 0x00ff00ff;
  if (p & l_Mask)  {  l_Result -=  8;  }  else  {  l_Mask ^= l_Range;  }
  l_Range &= l_Mask;
  l_Mask  &= 0x0f0f0f0f;
  if (p & l_Mask)  {  l_Result -=  4;  }  else  {  l_Mask ^= l_Range;  }
  l_Range &= l_Mask;
  l_Mask  &= 0x33333333;
  if (p & l_Mask)  {  l_Result -=  2;  }  else  {  l_Mask ^= l_Range;  }
  l_Mask  &= 0x55555555;
  if (p & l_Mask)  {  l_Result -=  1;  }

  return l_Result;
}
葬心 2024-08-29 12:31:08

我将使用 adjacent_find 和一个谓词,将“邻接”定义为两个不连续的元素。该解决方案不依赖于 INT_MAX。还是感觉有点笨拙。

bool notSequential(int a, int b) { return (a + 1) != b; }

void setToRanges(const std::set<int>& indices, std::vector<Range>& ranges)
{
  std::set<int>::iterator iter = indices.begin();
  std::set<int>::iterator end = indices.end();
  int first;
  while (iter != end)
  {
    first = *iter;
    iter = std::adjacent_find(iter, end, notSequential);
    if (iter != end)
    {
      ranges.push_back(std::make_pair(first, *iter));
      ++iter;
    }
  }
  ranges.push_back(std::make_pair(first, *--iter));
}

end 的测试超出了必要的范围。 adjacent_find 永远无法返回列表的最后一个元素,因此递增的迭代器永远不会 end ,因此仍然可以取消引用。它可以重写为:

void setToRanges(const std::set<int>& indices, std::vector<Range>& ranges)
{
  std::set<int>::iterator iter = indices.begin();
  std::set<int>::iterator end = indices.end();
  if (iter == end) return; // empty set has no ranges
  int first;
  while (true)
  {
    first = *iter;
    iter = std::adjacent_find(iter, end, notSequential);
    if (iter == end) break;
    ranges.push_back(std::make_pair(first, *iter++));
  }
  ranges.push_back(std::make_pair(first, *--iter));
}

I'd use adjacent_find with a predicate that defines "adjacency" as two elements that are not sequential. This solution doesn't depend on INT_MAX. Still feels kinda clunky.

bool notSequential(int a, int b) { return (a + 1) != b; }

void setToRanges(const std::set<int>& indices, std::vector<Range>& ranges)
{
  std::set<int>::iterator iter = indices.begin();
  std::set<int>::iterator end = indices.end();
  int first;
  while (iter != end)
  {
    first = *iter;
    iter = std::adjacent_find(iter, end, notSequential);
    if (iter != end)
    {
      ranges.push_back(std::make_pair(first, *iter));
      ++iter;
    }
  }
  ranges.push_back(std::make_pair(first, *--iter));
}

That tests against end more than necessary. adjacent_find can never return the last element of a list, so the incremented iterator will never be end and thus can still be dereferenced. It could be rewritten as:

void setToRanges(const std::set<int>& indices, std::vector<Range>& ranges)
{
  std::set<int>::iterator iter = indices.begin();
  std::set<int>::iterator end = indices.end();
  if (iter == end) return; // empty set has no ranges
  int first;
  while (true)
  {
    first = *iter;
    iter = std::adjacent_find(iter, end, notSequential);
    if (iter == end) break;
    ranges.push_back(std::make_pair(first, *iter++));
  }
  ranges.push_back(std::make_pair(first, *--iter));
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文