迭代 pandas 行组并根据组平均值和标准差过滤异常值

发布于 2025-01-14 06:05:04 字数 903 浏览 3 评论 0原文

我有一列由 100 个样本组成的数据。我试图排除一些不符合组均值和标准差的样本。因此，我有 10 个连续的数据组，其中有 10 个平均值和标准差值。最后，我想根据每组数据自己的组平均值和标准差值删除异常值。

这是我的代码：

Ch_37['MA']=Ch_37['Rssi-1'].rolling(window=10,center=False).mean() # moving average of 10 samples
Ch_37['std']=Ch_37['Rssi-1'].rolling(window=10,center=False).std() # moving standard deviation of 10 samples

ave= Ch_37['MA'].iloc[9::10].reset_index(drop=True) # filter the average to form 10 group
std = Ch_37['std'].iloc[9::10].reset_index(drop=True)

# remove outlier from each group based on group mean and standard deviation
new_rssi_ch_37=Ch_37['Rssi-1'].iloc[::10].between(ave.iloc[::1].sub(std.iloc[::1].mul(1)),
                ave.iloc[::1].add(std.iloc[::1].mul(1)),
                inclusive=False)

# Some of the data samples are shown below
-37
-49
-52
-69
-42
-50
-46
-34
-37
-59
-61
-72
...

我一直在思考迭代每个组并提取值

原文

I have a column of data consisting of 100 samples. I am trying to exclude some of the samples which does not fit within a group mean and standard deviation. So, I have 10 consecutive group of data with 10 mean and standard deviation values. Finally, I want to remove outliers from each group data based on its own group mean and standard deviation values.

Here is my code:

Ch_37['MA']=Ch_37['Rssi-1'].rolling(window=10,center=False).mean() # moving average of 10 samples
Ch_37['std']=Ch_37['Rssi-1'].rolling(window=10,center=False).std() # moving standard deviation of 10 samples

ave= Ch_37['MA'].iloc[9::10].reset_index(drop=True) # filter the average to form 10 group
std = Ch_37['std'].iloc[9::10].reset_index(drop=True)

# remove outlier from each group based on group mean and standard deviation
new_rssi_ch_37=Ch_37['Rssi-1'].iloc[::10].between(ave.iloc[::1].sub(std.iloc[::1].mul(1)),
                ave.iloc[::1].add(std.iloc[::1].mul(1)),
                inclusive=False)

# Some of the data samples are shown below
-37
-49
-52
-69
-42
-50
-46
-34
-37
-59
-61
-72
...

I am stuck in thinking to iterate through each group and extracting the values

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

花辞树 2025-01-21 06:05:04

我添加了以下逻辑，暂时它工作正常，尽管它看起来不像一个优雅的解决方案。每步长为 10 后，for 循环都会打印我无法忽略的列名称和数据类型。

 for i in range(0,len(Ch_37),10):

    if k>=len(ave):
      break
    mask=Ch_37['Rssi-1'].loc[i:i+9].between(ave[k]-std[k],
                      ave[k]+std[k],
                      inclusive='both')   #.loc[i:i+9:1]
   

    print(Ch_37['Rssi-1'].loc[i:i+9][mask])
    k=k+1

I have added below logic and temporarily it works fine though it does not look like an elegant solution. After every step size of 10 the for loop prints column name and data type which I am not able to ignore.

 for i in range(0,len(Ch_37),10):

    if k>=len(ave):
      break
    mask=Ch_37['Rssi-1'].loc[i:i+9].between(ave[k]-std[k],
                      ave[k]+std[k],
                      inclusive='both')   #.loc[i:i+9:1]
   

    print(Ch_37['Rssi-1'].loc[i:i+9][mask])
    k=k+1

回复收藏 0 原文

~没有更多了~