Apache Commons Math 2.2 百分位数错误?

发布于 2024-10-29 23:58:14 字数 1396 浏览 2 评论 0 原文

我不能 100% 确定这是一个错误还是我没有做正确的事情,但是如果您向 Percentile 提供大量与相同值一致的数据(请参见下面的代码),则评估方法需要很长时间。如果您给出百分位数,则评估随机值所需的时间会大大缩短。

如下所述,中位数是百分位数的子类。

百分位 java 文档

private void testOne(){
  int size = 200000;
  int sameValue = 100;
  List<Double> list = new ArrayList<Double>();

  for (int i = 0; i < size; i++)
  {
    list.add((double)sameValue);
  }
  Median m = new Median();
  m.setData(ArrayUtils.toPrimitive(list.toArray(new Double[0])));

  long start = System.currentTimeMillis();
  System.out.println("Start:"+ start);

  double result = m.evaluate();

  System.out.println("Result:" + result);
  System.out.println("Time:"+ (System.currentTimeMillis()- start));
}


private void testTwo(){
  int size = 200000;
  List<Double> list = new ArrayList<Double>();

  Random r = new Random();

  for (int i = 0; i < size; i++)
  {
    list.add(r.nextDouble() * 100.0);
  }
  Median m = new Median();
  m.setData(ArrayUtils.toPrimitive(list.toArray(new Double[0])));

  long start = System.currentTimeMillis();
  System.out.println("Start:"+ start);

  double result = m.evaluate();

  System.out.println("Result:" + result);
  System.out.println("Time:"+ (System.currentTimeMillis()- start));
}

I am not 100% sure if this is a bug or I am not doing something right but if you give Percentile a large amount of data that is the consistent of the same value (see code below) the evaluate method takes a very long time. If you give Percentile the random values evaluate takes a considerable shorter time.

As noted below Median is a subcalss of Percentile.

Percentile java doc

private void testOne(){
  int size = 200000;
  int sameValue = 100;
  List<Double> list = new ArrayList<Double>();

  for (int i = 0; i < size; i++)
  {
    list.add((double)sameValue);
  }
  Median m = new Median();
  m.setData(ArrayUtils.toPrimitive(list.toArray(new Double[0])));

  long start = System.currentTimeMillis();
  System.out.println("Start:"+ start);

  double result = m.evaluate();

  System.out.println("Result:" + result);
  System.out.println("Time:"+ (System.currentTimeMillis()- start));
}


private void testTwo(){
  int size = 200000;
  List<Double> list = new ArrayList<Double>();

  Random r = new Random();

  for (int i = 0; i < size; i++)
  {
    list.add(r.nextDouble() * 100.0);
  }
  Median m = new Median();
  m.setData(ArrayUtils.toPrimitive(list.toArray(new Double[0])));

  long start = System.currentTimeMillis();
  System.out.println("Start:"+ start);

  double result = m.evaluate();

  System.out.println("Result:" + result);
  System.out.println("Time:"+ (System.currentTimeMillis()- start));
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

妖妓 2024-11-05 23:58:14

这是版本 2.0 和 2.1 之间的已知问题,已修复版本 3.1

2.0 版本确实涉及对数据进行排序,但在 2.1 中,他们似乎已切换到选择算法 。但是,其实现中的一个错误导致具有大量相同值的数据出现一些不良行为。基本上他们使用 >= 和 <= 而不是 >和<。

This is a known issue between versions 2.0 and 2.1 and has been fixed for version 3.1.

Version 2.0 did indeed involve sorting the data, but in 2.1 they seemed to have switched to a selection algorithm. However, a bug in their implementation of that led to some bad behavior for data with lots of identical values. Basically they used >= and <= instead of > and <.

吲‖鸣 2024-11-05 23:58:14

众所周知,某些算法对于某些数据集可能表现出较慢的性能。实际上可以通过在执行操作之前随机化数据集来提高性能。

由于百分位数可能涉及对数据进行排序,因此我猜测您的“错误”实际上并不是代码中的缺陷,而是性能较慢的数据集之一的表现。

It's well known that some algorithms can exhibit slower performance for certain data sets. Performance can actually be improved by randomizing the data set before performing the operation.

Since percentile probably involves sorting the data, I'm guessing that your "bug" is not really a defect in the code, but rather the manifestation of one of the slower performing data sets.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文