当前位置：文江博客话题详情

从集合中返回中间的 n（值非索引）

发布于 2024-11-02 03:09:24 字数 201 浏览 1 评论 0原文

我有一个 List，我需要删除异常值，所以想使用一种只取中间 n 的方法。我想要的是值的中间值，而不是索引。

例如，给定以下列表，如果我想要中间 80%，我会期望 11 和 100 将被删除。

11、22、22、33、44、44、55、55、55、100。

在 LINQ 中是否有一种简单/内置的方法可以做到这一点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小猫一只 2024-11-09 03:09:24

我有一个 List 并且我需要删除异常值，所以想使用一种只取中间 n 的方法。我想要的是值的中间值，而不是索引。

正确去除异常值完全取决于准确描述数据分布的统计模型——您尚未向我们提供该模型。

假设它是正态（高斯）分布，这就是您想要做的。

首先计算平均值。这很容易；它只是总和除以项目数。

其次，计算标准差。标准差是衡量数据围绕平均值的“分布”程度的指标。计算方法：

取每个点与均
方
的差值取均方的均值——这是方差
取方差的平方根——这是标准差

在正态分布中 80%项目在平均值的 1.2 个标准差之内。例如，假设平均值为 50，标准差为 20。您预计 80% 的样本将落在 50 - 1.2 * 20 和 50 + 1.2 * 20 之间。然后您可以从列表中过滤掉项目超出该范围的。

但请注意，这并没有消除“异常值”。这是删除与平均值相差超过 1.2 个标准差的元素，以获得平均值周围 80% 的区间。在正态分布中，人们期望定期看到“异常值”。 99.73% 的项目都在平均值的三个标准差之内，这意味着如果您有一千个观测值，那么看到两个或三个观测值超出平均值三个标准差以上是完全正常的！事实上，在给定一千个观测值时，任何地方最多有五个观测值与平均值相差超过三个标准差。

我认为您需要非常仔细地定义异常值的含义，并描述为什么您试图消除它们。看起来像异常值的事情可能根本不是异常值，它们是您应该关注的真实数据。

另请注意，如果正态分布不正确，则此分析均不正确！消除看似异常值的过程可能会遇到很大的麻烦，而实际上整个统计模型都是错误的。如果模型比正态分布更“尾重”，则异常值很常见，并且实际上不是异常值。当心！如果您的分布不正常，那么您需要告诉我们分布是什么，然后我们才能建议如何识别异常值并消除它们。

I have a List<int> and i need to remove the outliers so want to use an approach where I only take the middle n. I want the middle in terms of values, not index.

Removing outliers correctly depends entirely on the statistical model that accurately describes the distribution of the data -- which you have not supplied for us.

On the assumption that it is a normal (Gaussian) distribution, here's what you want to do.

First compute the mean. That's easy; it's just the sum divided by the number of items.

Second, compute the standard deviation. Standard deviation is a measure of how "spread out" the data is around the mean. Compute it by:

take the difference of each point from the mean
square the difference
take the mean of the squares -- this is the variance
take the square root of the variance -- this is the standard deviation

In a normal distribution 80% of the items are within 1.2 standard deviations of the mean. So, for example, suppose the mean is 50 and the standard deviation is 20. You would expect that 80% of the sample would fall between 50 - 1.2 * 20 and 50 + 1.2 * 20. You can then filter out items from the list that are outside of that range.

Note however that this is not removing "outliers". This is removing elements that are more than 1.2 standard deviations from the mean, in order to get an 80% interval around the mean. In a normal distribution one expects to see "outliers" on a regular basis. 99.73% of items are within three standard deviations of the mean, which means that if you have a thousand observations, it is perfectly normal to see two or three observations more than three standard deviations outside the mean! In fact, anywhere up to, say, five observations more than three standard deviations away from the mean when given a thousand observations probably does not indicate an outlier.

I think you need to very carefully define what you mean by outlier and describe why you are attempting to eliminate them. Things that look like outliers are potentially not outliers at all, they are real data that you should be paying attention to.

Also, note that none of this analysis is correct if the normal distribution is incorrect! You can get into big, big trouble eliminating what look like outliers when in fact you've actually got the entire statistical model wrong. If the model is more "tail heavy" than the normal distribution then outliers are common, and not actually outliers. Be careful! If your distribution is not normal then you need to tell us what the distribution is before we can recommend how to identify outliers and eliminate them.

回复收藏 0 原文

难得心□动 2024-11-09 03:09:24

您可以使用 Enumerable.OrderBy 方法对列表进行排序，然后使用 Enumerable.Skip 和 Enumerable.Take 函数，例如：

var result = nums.OrderBy(x => x).Skip(1).Take(8);

其中 nums 是整数列表。

如果您只想要“中间的 n 个值”，那么确定使用哪些值作为 Skip 和 Take 的参数应该如下所示：

nums.OrderBy(x => x).Skip((nums.Count - n) / 2).Take(n);

但是，当 (nums.Count - n) / 2 的结果不是整数时，您希望代码如何表现？

You could use the Enumerable.OrderBy method to sort your list, then use Enumerable.Skip and the Enumerable.Take functions, e.g.:

var result = nums.OrderBy(x => x).Skip(1).Take(8);

Where nums is your list of integers.

Figuring out what values to use as arguments for Skip and Take should look something like this, if you just want the "middle n values":

nums.OrderBy(x => x).Skip((nums.Count - n) / 2).Take(n);

However, when the result of (nums.Count - n) / 2 is not an integer, how do you want the code to behave?

回复收藏 0 原文

三寸金莲 2024-11-09 03:09:24

假设您没有做任何加权平均有趣的事情：

List<int> ints = new List<int>() { 11,22,22,33,44,44,55,55,55,100 };

int min = ints.Min();
double range = (ints.Max() - min);

var results = ints.Select(o => new { IntegralValue = o, Weight = (o - ints.Min()) / range} );

results.Where(o => o.Weight >= .1 && o.Weight < .9);

然后您可以根据需要过滤权重。根据需要放下顶部/底部 n%。

在你的情况下：

results.Where(o => o.Weight >= .1 && o.Weight < .9)

编辑：作为扩展方法，因为我喜欢扩展方法：

public static class Lulz
{
    public static List<int> MiddlePercentage(this List<int> ints, double Percentage)
    {
        int min = ints.Min();
        double range = (ints.Max() - min);

        var results = ints.Select(o => new { IntegralValue = o, Weight = (o - ints.Min()) / range} );

        double tolerance = (1 - Percentage) / 2;
        return results.Where(o => o.Weight >= tolerance && o.Weight < 1 - tolerance).Select(o => o.IntegralValue).ToList();
    }
}

用法：

List<int> ints = new List<int>() { 11,22,22,33,44,44,55,55,55,100 };
var results = ints.MiddlePercentage(.8);

Assuming you're not doing any weighted average funny business:

List<int> ints = new List<int>() { 11,22,22,33,44,44,55,55,55,100 };

int min = ints.Min();
double range = (ints.Max() - min);

var results = ints.Select(o => new { IntegralValue = o, Weight = (o - ints.Min()) / range} );

results.Where(o => o.Weight >= .1 && o.Weight < .9);

You can then filter on Weight as needed. Drop the top/botton n% as desired.

In your case:

results.Where(o => o.Weight >= .1 && o.Weight < .9)

Edit: As an extension method, because I like extension methods:

public static class Lulz
{
    public static List<int> MiddlePercentage(this List<int> ints, double Percentage)
    {
        int min = ints.Min();
        double range = (ints.Max() - min);

        var results = ints.Select(o => new { IntegralValue = o, Weight = (o - ints.Min()) / range} );

        double tolerance = (1 - Percentage) / 2;
        return results.Where(o => o.Weight >= tolerance && o.Weight < 1 - tolerance).Select(o => o.IntegralValue).ToList();
    }
}

Usage:

List<int> ints = new List<int>() { 11,22,22,33,44,44,55,55,55,100 };
var results = ints.MiddlePercentage(.8);

回复收藏 0 原文

醉殇 2024-11-09 03:09:24

通常，如果您想从一组值中排除统计异常值，您需要计算该组值的算术平均值和标准差，然后删除距离平均值较您想要的值（以标准差衡量）的值。正态分布（经典的钟形曲线）具有以下属性：

大约 68% 的数据与平均值的偏差在 +/- 1 标准差范围内。
大约 95% 的数据与平均值的偏差在 +/- 2 标准差之内。
大约 99.7% 的数据将位于平均值的 +/- 3 个标准差范围内。

您可以在 http://www.codeproject.com/KB/linq/LinqStatistics.aspx" rel="nofollow">http:// /www.codeproject.com/KB/linq/LinqStatistics.aspx

回复收藏 0 原文

枕梦 2024-11-09 03:09:24

我不会质疑计算异常值的有效性，因为我也有类似的需要进行这种选择。取中间 n 的具体问题的答案是：

List<int> ints = new List<int>() { 11,22,22,33,44,44,55,55,55,100 };
var result = ints.Skip(1).Take(ints.Count() - 2);

这会跳过第一项，并在最后一项之前停止，只提供中间 n 项。以下是演示此查询的 .NET Fiddle 的链接。

https://dotnetfiddle.net/p1z7em

I am not going to question the validity of calculating outliers since I had a similar need to do exactly this kind of selection. The answer to the specific question of taking the middle n is:

List<int> ints = new List<int>() { 11,22,22,33,44,44,55,55,55,100 };
var result = ints.Skip(1).Take(ints.Count() - 2);

This skips the first item, and stops before the last giving you just the middle n items. Here is a link to a .NET Fiddle demonstrating this query.

https://dotnetfiddle.net/p1z7em

回复收藏 0 原文

简美 2024-11-09 03:09:24

我有一个列表，我需要删除异常值，所以想使用一种只取中间 n 的方法。我想要的是值的中间值，而不是索引。

如果我理解正确的话，我们希望保留 11-100 范围中间 80% 的任何值，或者

min + (max - min - (max - min) * 0.8) / 2 < x < max - (max - min - (max - min) * 0.8) / 2

假设一个有序列表，我们可以跳过当值低于 lowerBound 时，然后TakeWhile 数字比 upperBound 更可爱

public void Calculalte()
{
    var numbers = new[] { 11, 22, 22, 33, 44, 44, 55, 55, 55, 100 };

    var percentage = 0.8;

    var result = RemoveOutliers(numbers, percentage);
}

private IEnumerable<int> RemoveOutliers(int[] numbers, double percentage)
{
    int min = numbers.First();
    int max = numbers.Last();
    double range = (max - min);
    double lowerBound = min + (range - range * percentage) / 2;
    double upperBound = max - (range - range * percentage) / 2;
    return numbers.SkipWhile(n => n < lowerBound).TakeWhile(n => n < upperBound);   
}

I have a List and I need to remove the outliers so want to use an approach where I only take the middle n. I want the middle in terms of values, not index.

If I understand correctly we want to keep any values that fall into the middle 80% of the 11-100 range, or

min + (max - min - (max - min) * 0.8) / 2 < x < max - (max - min - (max - min) * 0.8) / 2

Assuming an ordered list, we can SkipWhile the values are lower than the lowerBound, and then TakeWhile the numbers are lover than the upperBound

public void Calculalte()
{
    var numbers = new[] { 11, 22, 22, 33, 44, 44, 55, 55, 55, 100 };

    var percentage = 0.8;

    var result = RemoveOutliers(numbers, percentage);
}

private IEnumerable<int> RemoveOutliers(int[] numbers, double percentage)
{
    int min = numbers.First();
    int max = numbers.Last();
    double range = (max - min);
    double lowerBound = min + (range - range * percentage) / 2;
    double upperBound = max - (range - range * percentage) / 2;
    return numbers.SkipWhile(n => n < lowerBound).TakeWhile(n => n < upperBound);   
}

回复收藏 0 原文

~没有更多了~