简单的数据处理
假设我得到了这组数据。排序后可以得出如下分布。
M=[-99 -99 -44.5 -7.375 -5.5 -1.666666667 -1.333333333 -1.285714286 0.436363636 2.35 3.3 4.285714286 5.052631579 6.2 7.076923077 7.230769231 7.916666667 9.7 10.66666667 16.16666667 17.4 19.2 19.6 20.75 24.25 34.5 49.5]
我的问题是如何找出中间范围内的那些值并记录索引。使用正态分布还是其他什么?感谢您的帮助!
乔纳斯的图片
Let's say I got this set of data. After sorting the distribution can be drawn out like below.
M=[-99 -99 -44.5 -7.375 -5.5 -1.666666667 -1.333333333 -1.285714286 0.436363636 2.35 3.3 4.285714286 5.052631579 6.2 7.076923077 7.230769231 7.916666667 9.7 10.66666667 16.16666667 17.4 19.2 19.6 20.75 24.25 34.5 49.5]
My question is how do I find out those values that are among the middle range and record the indices. Using normal distribution or anything else? Appreciate your help!
Picture for Jonas'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
假设您的中间范围是 [-10 10] 那么索引将是:
请注意,您也可以通过逻辑索引访问这些值,例如:
要获得您的中间范围,只需:
另请注意
M(:)此处使用
来确保quantile
将M
视为向量。您可以采用约定,程序中的所有向量都是列向量,然后大多数函数会自动正确处理它们。更新:
现在,对分位数的一个非常简短的描述是:它们是从随机变量的累积分布函数(
cdf
)中获取的点。 (现在您的M
被假定为一种cdf
,因为它是非递减的并且可以标准化为总和为 1)。现在“简单地”数据的分位数 0.5“意味着 50% 的值低于该分位数”。有关分位数的更多详细信息,请参见此处。Assuming your mid range is [-10 10] then the indices would be:
Please note that you can acces the values also by logical indexing, like:
And to get your mid range, just:
Note also that
M(:)
is used here to ensure thatquantile
treatsM
as vector. You may adopt the convention that all vectors in your programs are column vectors, then most of the functions automatically treats them correctly.Update:
Now, for a very short description of quantiles is that: they are points taken from the cumulative distribution function (
cdf
) of a random variable. (Now yourM
is assumed to be a kind ofcdf
, since its nondecreasing and can be normalized to sum up to 1). Now 'simply' a quantile .5 of your data 'means that 50% of the values are lower than this quantile'. More details on quantiles can be found for example here.如果您先验不知道您的中间范围是什么,但您知道您想要丢弃曲线开头和结尾处的异常值,并且如果您有统计工具箱您可以使用 ROBUSTFIT 对数据进行稳健的线性回归,并且只保留内点。
If you don't know a priori what your middle range is, but you know that you want to discard the outliers both at the start and at the end of our curve, and if you have the Statistics Toolbox you can do a robust linear regression to your data using ROBUSTFIT, and only keep the inliers.