熊猫的间隔与下一行重叠

发布于 2025-01-26 11:36:00 字数 1585 浏览 3 评论 0原文

我有一个问题可以分组重叠的间隔，数据进行排序，只需要查找并分组行之间是否有重叠的间隔，而下一个则不存在重叠，而不是在所有行上重叠。

ID  start       end
1   01-04-2011  01-04-2011
1a  01-04-2011  30-09-2011
2   01-01-2012  31-03-2012
3   01-04-2012  31-10-2012
4   01-11-2012  31-03-2013
6   01-04-2013  31-10-2013
6a  01-10-2013  31-03-2014
7   01-04-2014  31-10-2014
9   01-11-2014  31-03-2015
10  01-04-2015  31-05-2015
11  01-06-2015  31-10-2015
12  01-11-2015  31-03-2016
13  01-10-2016  31-03-2017
14  01-04-2017  30-09-2017

“ https://drive.google.com/file/d/1-otkubu0ttqymbvejnvkggpqolxdlqm3/view?usp = sharing？它还没有结束。我需要确定ID1和ID1A是否重叠，如果没有重叠，ID1A和ID3是重叠的？等等。（ID过于简单）

我到处搜索，似乎无法解决。

预期的结果，该组将始终由2个间隔组成，并且由于CSV已经分类，它们将始终彼此相邻。

ID  start       end         overlap
1   01-04-2011  01-04-2011  Y_group1
1a  01-04-2011  30-09-2011  Y_group1

ID  start       end         overlap
6   01-04-2013  31-10-2013  Y_group2
6a  01-10-2013  31-03-2014  Y_group2

我找到了这个

，但是我得到了同样的错误valueError：至少需要一个数组来condenate

我找到了这个解决方案但是它返回了很多真实，也许是因为ID1？

intervals = df.apply(lambda row: pd.Interval(row['start'], row['end']), axis=1)
overlaps = [
    (i, j, x, y, x.overlaps(y)) 
    for ((i,x),(j,y))
    in itertools.product(enumerate(intervals), repeat=2)
]

原文

I have this problem to group overlapping intervals, the data is sorted, only need to find and group whether there are overlapping intervals between a row with the next one to it, not overlapping on all rows.

csv_file

ID  start       end
1   01-04-2011  01-04-2011
1a  01-04-2011  30-09-2011
2   01-01-2012  31-03-2012
3   01-04-2012  31-10-2012
4   01-11-2012  31-03-2013
6   01-04-2013  31-10-2013
6a  01-10-2013  31-03-2014
7   01-04-2014  31-10-2014
9   01-11-2014  31-03-2015
10  01-04-2015  31-05-2015
11  01-06-2015  31-10-2015
12  01-11-2015  31-03-2016
13  01-10-2016  31-03-2017
14  01-04-2017  30-09-2017

ID1 start and end are the same means that it has no end yet.
I need to determine are ID1 and ID1a are overlapping, if not then are ID1a and ID3 overlapping? and so on. (ID is oversimplified)

I've searched everywhere and can't seem to solve this.

the expected result, the group will always consist of 2 intervals and they'll always be next to each other since the CSV is already sorted.

ID  start       end         overlap
1   01-04-2011  01-04-2011  Y_group1
1a  01-04-2011  30-09-2011  Y_group1

ID  start       end         overlap
6   01-04-2013  31-10-2013  Y_group2
6a  01-10-2013  31-03-2014  Y_group2

I found this

but I got the same error ValueError: need at least one array to concatenate

I found this solution but it returns so many True, maybe because of ID1?

intervals = df.apply(lambda row: pd.Interval(row['start'], row['end']), axis=1)
overlaps = [
    (i, j, x, y, x.overlaps(y)) 
    for ((i,x),(j,y))
    in itertools.product(enumerate(intervals), repeat=2)
]

分享到QQ

分享到微博