熊猫的间隔与下一行重叠
我有一个问题可以分组重叠的间隔,数据进行排序,只需要查找并分组行之间是否有重叠的间隔,而下一个则不存在重叠,而不是在所有行上重叠。
ID start end
1 01-04-2011 01-04-2011
1a 01-04-2011 30-09-2011
2 01-01-2012 31-03-2012
3 01-04-2012 31-10-2012
4 01-11-2012 31-03-2013
6 01-04-2013 31-10-2013
6a 01-10-2013 31-03-2014
7 01-04-2014 31-10-2014
9 01-11-2014 31-03-2015
10 01-04-2015 31-05-2015
11 01-06-2015 31-10-2015
12 01-11-2015 31-03-2016
13 01-10-2016 31-03-2017
14 01-04-2017 30-09-2017
“ https://drive.google.com/file/d/1-otkubu0ttqymbvejnvkggpqolxdlqm3/view?usp = sharing?它还没有结束。 我需要确定ID1和ID1A是否重叠,如果没有重叠,ID1A和ID3是重叠的?等等。 (ID过于简单)
我到处搜索,似乎无法解决。
预期的结果,该组将始终由2个间隔组成,并且由于CSV已经分类,它们将始终彼此相邻。
ID start end overlap
1 01-04-2011 01-04-2011 Y_group1
1a 01-04-2011 30-09-2011 Y_group1
ID start end overlap
6 01-04-2013 31-10-2013 Y_group2
6a 01-10-2013 31-03-2014 Y_group2
,但是我得到了同样的错误valueError:至少需要一个数组来condenate
我找到了这个解决方案但是它返回了很多真实,也许是因为ID1?
intervals = df.apply(lambda row: pd.Interval(row['start'], row['end']), axis=1)
overlaps = [
(i, j, x, y, x.overlaps(y))
for ((i,x),(j,y))
in itertools.product(enumerate(intervals), repeat=2)
]
I have this problem to group overlapping intervals, the data is sorted, only need to find and group whether there are overlapping intervals between a row with the next one to it, not overlapping on all rows.
ID start end
1 01-04-2011 01-04-2011
1a 01-04-2011 30-09-2011
2 01-01-2012 31-03-2012
3 01-04-2012 31-10-2012
4 01-11-2012 31-03-2013
6 01-04-2013 31-10-2013
6a 01-10-2013 31-03-2014
7 01-04-2014 31-10-2014
9 01-11-2014 31-03-2015
10 01-04-2015 31-05-2015
11 01-06-2015 31-10-2015
12 01-11-2015 31-03-2016
13 01-10-2016 31-03-2017
14 01-04-2017 30-09-2017
ID1 start and end are the same means that it has no end yet.
I need to determine are ID1 and ID1a are overlapping, if not then are ID1a and ID3 overlapping? and so on. (ID is oversimplified)
I've searched everywhere and can't seem to solve this.
the expected result, the group will always consist of 2 intervals and they'll always be next to each other since the CSV is already sorted.
ID start end overlap
1 01-04-2011 01-04-2011 Y_group1
1a 01-04-2011 30-09-2011 Y_group1
ID start end overlap
6 01-04-2013 31-10-2013 Y_group2
6a 01-10-2013 31-03-2014 Y_group2
but I got the same error ValueError: need at least one array to concatenate
I found this solution but it returns so many True, maybe because of ID1?
intervals = df.apply(lambda row: pd.Interval(row['start'], row['end']), axis=1)
overlaps = [
(i, j, x, y, x.overlaps(y))
for ((i,x),(j,y))
in itertools.product(enumerate(intervals), repeat=2)
]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论