迭代 pandas 行组并根据组平均值和标准差过滤异常值
我有一列由 100 个样本组成的数据。我试图排除一些不符合组均值和标准差的样本。因此,我有 10 个连续的数据组,其中有 10 个平均值和标准差值。最后,我想根据每组数据自己的组平均值和标准差值删除异常值。
这是我的代码:
Ch_37['MA']=Ch_37['Rssi-1'].rolling(window=10,center=False).mean() # moving average of 10 samples
Ch_37['std']=Ch_37['Rssi-1'].rolling(window=10,center=False).std() # moving standard deviation of 10 samples
ave= Ch_37['MA'].iloc[9::10].reset_index(drop=True) # filter the average to form 10 group
std = Ch_37['std'].iloc[9::10].reset_index(drop=True)
# remove outlier from each group based on group mean and standard deviation
new_rssi_ch_37=Ch_37['Rssi-1'].iloc[::10].between(ave.iloc[::1].sub(std.iloc[::1].mul(1)),
ave.iloc[::1].add(std.iloc[::1].mul(1)),
inclusive=False)
# Some of the data samples are shown below
-37
-49
-52
-69
-42
-50
-46
-34
-37
-59
-61
-72
...
我一直在思考迭代每个组并提取值
I have a column of data consisting of 100 samples. I am trying to exclude some of the samples which does not fit within a group mean and standard deviation. So, I have 10 consecutive group of data with 10 mean and standard deviation values. Finally, I want to remove outliers from each group data based on its own group mean and standard deviation values.
Here is my code:
Ch_37['MA']=Ch_37['Rssi-1'].rolling(window=10,center=False).mean() # moving average of 10 samples
Ch_37['std']=Ch_37['Rssi-1'].rolling(window=10,center=False).std() # moving standard deviation of 10 samples
ave= Ch_37['MA'].iloc[9::10].reset_index(drop=True) # filter the average to form 10 group
std = Ch_37['std'].iloc[9::10].reset_index(drop=True)
# remove outlier from each group based on group mean and standard deviation
new_rssi_ch_37=Ch_37['Rssi-1'].iloc[::10].between(ave.iloc[::1].sub(std.iloc[::1].mul(1)),
ave.iloc[::1].add(std.iloc[::1].mul(1)),
inclusive=False)
# Some of the data samples are shown below
-37
-49
-52
-69
-42
-50
-46
-34
-37
-59
-61
-72
...
I am stuck in thinking to iterate through each group and extracting the values
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我添加了以下逻辑,暂时它工作正常,尽管它看起来不像一个优雅的解决方案。每步长为 10 后,for 循环都会打印我无法忽略的列名称和数据类型。
I have added below logic and temporarily it works fine though it does not look like an elegant solution. After every step size of 10 the for loop prints column name and data type which I am not able to ignore.