查找序列中缺失和重叠的数字

发布于 2024-11-28 19:36:31 字数 759 浏览 0 评论 0原文

假设我们有一个这样的数据结构:

var sequences = new List<Tuple<int, int>>
                {
                    new Tuple<int, int>(1, 10),
                    new Tuple<int, int>(8, 101),
                    new Tuple<int, int>(102, 103),
                    new Tuple<int, int>(104, 104),
                    new Tuple<int, int>(110, 200)
                };

我想从这个集合中得到两个结果:

  • 所有缺失的数字(在本例中:105、106、107、108、109)
  • 所有重叠的数字(在本例中:8) , 9, 10)

我可以编写一个带有几个循环和辅助集合的算法。这当然可以,但我想知道这是否可以通过 LINQ 和/或其他更简单、更短的算法来实现?

编辑: 上面示例中的我的数据结构表示 5 个序列,第一个包含从 1 到 10 的数字,第二个包含从 8 到 101 的数字,依此类推...因为在生产中序列可能会更大(最多数以百万计),它们不是用实际的集合(例如包含所有数字的列表)表示的,而是用元组表示的,元组表示每个序列的最小和最大数量。

Let's say we have a data structure like this:

var sequences = new List<Tuple<int, int>>
                {
                    new Tuple<int, int>(1, 10),
                    new Tuple<int, int>(8, 101),
                    new Tuple<int, int>(102, 103),
                    new Tuple<int, int>(104, 104),
                    new Tuple<int, int>(110, 200)
                };

I would like to get two results from this collection:

  • All the missing numbers (in this example: 105, 106, 107, 108, 109)
  • All the overlapping numbers (in this example: 8, 9, 10)

I could write an algorithm with a couple of loops and helper collections. This off course would work, but I wonder if this could be achieved with the help of LINQ and/or other easier and shorter algorithms?

Edit:
My data structure from the example above is representing 5 sequences, the first one containing numbers from 1 to 10, the second one containing the numbers from 8 to 101 and so on... Because in the production the sequences can be much bigger (up to millions), they are not represented with the actual collection (for instance with Lists with all the numbers), but with Tuples, which represents the minimal and the maximal number of each sequence.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

梦明 2024-12-05 19:36:32

有一些边缘情况我只能假设你希望如何处理。我选择不处理其中之一(在代码中注释)。由于您没有给出如何表示丢失/重叠序列的指示,因此我选择了您自己的格式,使用元组来标识序列的开始和结束。

//Assumes they are sorted on item1
        Tuple<IEnumerable<Tuple<int,int>>,IEnumerable<Tuple<int,int>>> FindMissingAndOverLapping(IEnumerable<Tuple<int,int>> sequences){
            var previous = Tuple.Create(0, 0);
            var missing = new List<Tuple<int,int>>();
            var overlapping = new List<Tuple<int, int>>();
            var max = 0;
            foreach (var sequence in sequences){
                var end = previous.Item2;
                max = end > max ? end : max;
                if (previous.Item2 < sequence.Item1 + 1){
                    missing.Add(Tuple.Create(previous.Item2 + 1, sequence.Item1 - 1));
                } else if (max < sequence.Item1){
                    overlapping.Add(Tuple.Create(sequence.Item1, max));
                }
            }
            //The sequences in ovrelapping can be ovrelapping them self
            return new Tuple<IEnumerable<Tuple<int,int>>,IEnumerable<Tuple<int,int>>>(missing, overlapping);
        }

There's a few edge cases I can only assume how you wish to handle. I've chosen not to handle one of them (commented in the code). Since you've given no indication of how you would want to represent missing/ovrelapping sequences I've chosen your own format using tuples to identify the start and end of the sequence.

//Assumes they are sorted on item1
        Tuple<IEnumerable<Tuple<int,int>>,IEnumerable<Tuple<int,int>>> FindMissingAndOverLapping(IEnumerable<Tuple<int,int>> sequences){
            var previous = Tuple.Create(0, 0);
            var missing = new List<Tuple<int,int>>();
            var overlapping = new List<Tuple<int, int>>();
            var max = 0;
            foreach (var sequence in sequences){
                var end = previous.Item2;
                max = end > max ? end : max;
                if (previous.Item2 < sequence.Item1 + 1){
                    missing.Add(Tuple.Create(previous.Item2 + 1, sequence.Item1 - 1));
                } else if (max < sequence.Item1){
                    overlapping.Add(Tuple.Create(sequence.Item1, max));
                }
            }
            //The sequences in ovrelapping can be ovrelapping them self
            return new Tuple<IEnumerable<Tuple<int,int>>,IEnumerable<Tuple<int,int>>>(missing, overlapping);
        }
念三年u 2024-12-05 19:36:31

你可以通过

var missing = 
      Enumerable.Range(1, 200)
               .Where(i => sequences.All(t => t.Item1 > i || t.Item2 < i));
var overlapping = 
      Enumerable.Range(1, 200)
                .Where(i => sequences.Count(t => t.Item1 <= i && t.Item2 >= i) > 1);

You could do it via

var missing = 
      Enumerable.Range(1, 200)
               .Where(i => sequences.All(t => t.Item1 > i || t.Item2 < i));
var overlapping = 
      Enumerable.Range(1, 200)
                .Where(i => sequences.Count(t => t.Item1 <= i && t.Item2 >= i) > 1);
晒暮凉 2024-12-05 19:36:31

我知道这个问题的算法(它是伪代码)。 (复杂度类 O(nlog(n)) 其中 n 是元组的计数)

因此解决方案是按函数对元组进行排序:

  int comparer( Tuple a, Tuple b) {
      if ( a.first.compareTo(b.first) == 0 ) {
          return a.second.compareTo(b.second);
      } else 
          return a.first.compareTo(b.first);
  }

例如元组: (1, 10), (1, 5), (2 , 8) 将排序为:
(1,5)、(1、10)、(2、8)。

下一步是累积这个结果。迭代这个结果并且:

 Tuple result = SortedList[0];
 foreach ( Tuple tuple in SortedList ) {

     if ( result.second < tuple.first ) {

        // here you have missing number (result.second, tuple.first)

        result.first = tuple.first; 
        result.second = tuple.second
     } else if ( result.second > tuple.first ) {

        // here you have overlapping number (tuple.first, min( result.second,tuple.second ))

        if ( result.second < tuple.second ) {
              result.second = tuple.second;
        }
     } else {
        result.second = tuple.second;
     }

 }

我们知道,如果将迭代下一个元组,第一个数字大于或等于 result.first。代码中的注释会告诉您哪里有重叠和缺失的数字

I know algorithm for this problem (it is pseudoCode). ( Complexity classes O(nlog(n)) where n is count of tuple)

So solution is sort Tuple by function:

  int comparer( Tuple a, Tuple b) {
      if ( a.first.compareTo(b.first) == 0 ) {
          return a.second.compareTo(b.second);
      } else 
          return a.first.compareTo(b.first);
  }

so example tuple: (1, 10), (1, 5), (2, 8) will be sort to:
(1,5), (1, 10), (2, 8).

Next step is accumulate this result. Iterate on this result and :

 Tuple result = SortedList[0];
 foreach ( Tuple tuple in SortedList ) {

     if ( result.second < tuple.first ) {

        // here you have missing number (result.second, tuple.first)

        result.first = tuple.first; 
        result.second = tuple.second
     } else if ( result.second > tuple.first ) {

        // here you have overlapping number (tuple.first, min( result.second,tuple.second ))

        if ( result.second < tuple.second ) {
              result.second = tuple.second;
        }
     } else {
        result.second = tuple.second;
     }

 }

What we know, that if will be iterate next tuple first number is bigger or equals than result.first. Commentate in code tell you where you have overlapping and missing number

π浅易 2024-12-05 19:36:31

试试这个

var expandedSequences = sequences.Select(t => Enumerable.Range(t.Item1, t.Item2-t.Item1)).SelectMany(t => t).OrderBy(i => i);
var dupes = expandedSequences.GroupBy(i => i).Where(g => g.Count() > 1).Select(g => g.Key);
var missing = Enumerable.Range(expandedSequences.Min(), expandedSequences.Max()).Except(expandedSequences);

try this

var expandedSequences = sequences.Select(t => Enumerable.Range(t.Item1, t.Item2-t.Item1)).SelectMany(t => t).OrderBy(i => i);
var dupes = expandedSequences.GroupBy(i => i).Where(g => g.Count() > 1).Select(g => g.Key);
var missing = Enumerable.Range(expandedSequences.Min(), expandedSequences.Max()).Except(expandedSequences);
妄司 2024-12-05 19:36:31

一次性完成:

var sequences = new List<Tuple<int, int>>
    {
        new Tuple<int, int>(1, 10),
        new Tuple<int, int>(8, 101),
        new Tuple<int, int>(102, 103),
        new Tuple<int, int>(104, 104),
        new Tuple<int, int>(110, 200)
    };
var missing = new List<int>();
var overlap = new List<int>();

sequences.Aggregate((prev, current) => {
    if (prev.Item2 >= current.Item1) {
        overlap.AddRange(Enumerable.Range(current.Item1, prev.Item2 - current.Item1 + 1));
    }
    if (current.Item1 > prev.Item2 + 1) {
        missing.AddRange(Enumerable.Range(prev.Item2 + 1, current.Item1 - prev.Item2 - 1));
    }
    return current;
});

In one pass:

var sequences = new List<Tuple<int, int>>
    {
        new Tuple<int, int>(1, 10),
        new Tuple<int, int>(8, 101),
        new Tuple<int, int>(102, 103),
        new Tuple<int, int>(104, 104),
        new Tuple<int, int>(110, 200)
    };
var missing = new List<int>();
var overlap = new List<int>();

sequences.Aggregate((prev, current) => {
    if (prev.Item2 >= current.Item1) {
        overlap.AddRange(Enumerable.Range(current.Item1, prev.Item2 - current.Item1 + 1));
    }
    if (current.Item1 > prev.Item2 + 1) {
        missing.AddRange(Enumerable.Range(prev.Item2 + 1, current.Item1 - prev.Item2 - 1));
    }
    return current;
});
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文