彼此之间10%以内的多个数据范围的查找和平均元素
我有存储值是数据框的对象。我已经能够比较来自两个数据范围的值是否在彼此的10%以内。但是,我很难将其扩展到多个数据范围。此外,我想知道,如果数据范围的大小不一样,我应该如何处理这个问题?
def add_well_peak(self, *other):
if len(self.Bell) == len(other.Bell): #if dataframes ARE the same size
for k in range(len(self.Bell)):
for j in range(len(other.Bell)):
if int(self.Size[k]) - int(self.Size[k])*(1/10) <= int(other.Size[j]) <= int(self.Size[k]) + int(self.Size[k])*(1/10):
#average all
例如,在下图中,有一些对象包含数据范围(即,self,other,其他1,其他2)。颜色表示匹配(即,彼此之间10%以内的值)。如果存在匹配,则为平均值。如果不存在匹配,仍然包括不比分的号码。我希望能够对大于或等于2的任何数量的对象(其他1,其他2,其他3,其他....)概括。任何帮助将不胜感激。请让我知道是否有任何尚不清楚。这是我第一次发布。再次感谢。
I have objects that store values are dataframes. I have been able to compare if values from two dataframes are within 10% of each other. However, I am having difficulty extending this to multiple dataframes. Moreover, I am wondering how I should apporach this problem if dataframes are not the same size?
def add_well_peak(self, *other):
if len(self.Bell) == len(other.Bell): #if dataframes ARE the same size
for k in range(len(self.Bell)):
for j in range(len(other.Bell)):
if int(self.Size[k]) - int(self.Size[k])*(1/10) <= int(other.Size[j]) <= int(self.Size[k]) + int(self.Size[k])*(1/10):
#average all
For example, in the image below, there are objects that contain dataframes (i.e., self, other1, other2). The colors represent matches (i.e, values that are within 10% of each other). If a match exist, then average the values. If a match does not exist still include the unmatch number. I want to be able to generalize this for any number of objects greater or equal than 2 (other 1, other 2, other 3, other ....). Any help would be appreciated. Please let me know if anything is unclear. This is my first time posting. Thanks again.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
结果:
在图像的数据范围内使用我的解决方案,我会得到以下内容:
阈值离群值= 0.2:
阈值离群值= 0.5:
说明:
线是平均峰,代表这些峰值获得的不同值的列。我以为最大数量元素的平均发出是合法的,而
threshold_outlier
的其余部分是离群值(应该进行排序,您作为合法峰的可能性就越大,您越多,您就越多在左侧(第0列是最可能的))。例如,在0.5个离群阈值结果的第3行中,43586.500000
是一个平均来自3个dataframes,而35785.3333333
仅来自2个一。问题:
解决方案非常复杂。我认为可以删除其中的很大一部分,但是我看不到目前的方式,而且由于它的工作原理,我一定会将优化留给您。
不过,我尝试评论自己的最好,如果您有任何疑问,请不要犹豫!
文件:
compinationLib.py
errors.py
main.py
Results:
Using my solution on the dataframes of your image, I get the following:
Threshold outlier = 0.2:
Threshold outlier = 0.5:
Explanations:
The lines are averaged peaks, the columns representing the different values obtained for these peaks. I assumed the average emanating from the biggest number of elements was the legitimate one, and the rest within the
THRESHOLD_OUTLIER
were the outliers (should be sorted, the more probable you are as a legitimate peak, the more you are on the left (the 0th column is the most probable)). For instance, on line 3 of the 0.5 outlier threshold results,43586.500000
is an average coming from 3 dataframes, while35785.333333
comes from only 2, thus the most probable is the first one.Issues:
The solution is quite complicated. I assume a big part of it could be removed, but I can't see how for the moment, and as it works, I'll certainly leave the optimization to you.
Still, I tried commenting my best, and if you have any question, do not hesitate!
Files:
CombinationLib.py
Errors.py
main.py