并发方法是加速长时间迭代的好主意吗?

发布于 2024-09-16 04:36:12 字数 1036 浏览 8 评论 0原文

我有一个应用程序,它会随着时间的推移进行迭代以在图表上创建点。 当我收集 x 轴上每个点的数据时,我还必须执行递归查找,这实际上意味着我在另一个循环中有一个循环。这扩展得不太好。我没有看到很多在迭代中使用“分而治之”解决方案的例子。我正在考虑使用 Java 的执行器并发框架在它自己的线程中运行每个循环,等待答案,收集结果并返回它们。我得到的初步测试结果似乎并没有那么快。我知道我应该展示一些代码,但我首先想知道的是,与我可能不熟悉的更好方法相比,这种方法是否具有优点。 提前致谢!

添加一些 groovyish/javaish 伪代码来帮助思考这个问题:

class Car {
    id
    model 
    make
    weight
}

for (number in listOfImportantCarIDs) {
    Car car = carsMap.get(number) // find the car we care about
    String maker = car.make //get it's 'parent' 

    // get amount of all related cars
    Iterator<Car> allcars = carsMap.values().iterator();
    while (allcars.hasNext()) {
        Car aCar = alldocs.next();
        if (maker.equals(aCar.make)) {
            totalCarCount++;  // increment total related cars 
            BigDecimal totalWeightofAllCars = totalWeightofAllCars.add(aCar.getWeight()); // add weight to total

            // a ghetto cache to prevent double  counting
            countedMaufacturers.add(make); 
        }     
    }
}

I have an app that does an iteration to create points on a graph over time.
While I'm gathering data for each point across the x-axis I also must execute a recursive lookup which effectually means I have a loop inside another loop. This is not scaling too well. I don't see a lot of examples of using a "divide and conquer" solution on iterations. I was thinking of using Java's Executor concurrency framework to run each loop in it's own thread, await the answers, gather the results and return them. The initial test results I'm getting don't seem that much faster. I know I should show some code but what I want to know first is if this approach has merit compared to better methods that I may not be familiar with.
Thanks in advance!

Adding some groovyish/javaish pseudo code to assist thinking about this:

class Car {
    id
    model 
    make
    weight
}

for (number in listOfImportantCarIDs) {
    Car car = carsMap.get(number) // find the car we care about
    String maker = car.make //get it's 'parent' 

    // get amount of all related cars
    Iterator<Car> allcars = carsMap.values().iterator();
    while (allcars.hasNext()) {
        Car aCar = alldocs.next();
        if (maker.equals(aCar.make)) {
            totalCarCount++;  // increment total related cars 
            BigDecimal totalWeightofAllCars = totalWeightofAllCars.add(aCar.getWeight()); // add weight to total

            // a ghetto cache to prevent double  counting
            countedMaufacturers.add(make); 
        }     
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

一世旳自豪 2024-09-23 04:36:12

使用线程可以使应用程序加速一些小的常数因子,但代价是显着增加线程间通信的复杂性。如果存在更好的算法,那么可以节省几个数量级。因此,我强烈建议您首先验证是否确实没有次二次算法可以解决您的问题。

也许如果您详细说明了您要解决的问题以及当前的解决方案,我们可以在这里提供帮助。

编辑:天哪,找到一个更好的算法一点也不难:

for (Car car : cars {
    Stats s = stats.get(car.maker);
    if (s == null) {
        s = new Stats();
        stats.put(car.maker, s);
    }
    stats.count++;
    stats.totalWeight+=car.weight;
}

for (Car car in importantCars) {
    stats.get(car.maker);
}

对于每辆重要的汽车,实际上没有必要迭代所有汽车,只是为了找到具有相同制造商的汽车......

Using threads will speed up your application by some small constant factor, at the price of significant added complexity for inter-thread communication. If a better algorithm exists, that can save you orders of magnitude. Therefore, I strongly recommend that you first verify that there indeed isn't a sub-quadratic algorithm to solve your problem.

Perhaps if you detailed the problem you are trying to solve and your current solution, we could assist here.

Edit: Goodness, finding a much better algorithm isn't hard at all:

for (Car car : cars {
    Stats s = stats.get(car.maker);
    if (s == null) {
        s = new Stats();
        stats.put(car.maker, s);
    }
    stats.count++;
    stats.totalWeight+=car.weight;
}

for (Car car in importantCars) {
    stats.get(car.maker);
}

There really is no need to iterate, for each important car, over all cars just to find those with identical maker ...

黎夕旧梦 2024-09-23 04:36:12

如果计算每个 x 值的 y 值的任务对于每个 x 值来说是完全原子的,那么我认为这非常适合执行器服务。大多数性能问题只需要测量,即使在对解决方案进行了大量推理之后也是如此。 CPU 密集问题的最佳线程数是 p 或 p+1,请记住这一点。

您是否研究过动态编程方法?它适用于您的问题吗?本质上,递归意味着您一遍又一遍地解决相同的问题,但输入值稍小。当运行同一递归算法的多次迭代时,程序往往会重新解决相同的问题。相反,动态编程方法可能会将解存储在缓存中,并在第二次计算解之前引用缓存。

在不知道您的具体问题的情况下,很难给出准确的答案。

If the task for computing the y value for each x value is completely atomic for each x value, then I would think this is a good fit for an executor service. Most performance questions simply require measure, even after great lengths of reasoning about the solution. The optimal number of threads for CPU bound problems is p or p+1, keep that in mind.

Have you looked at a Dynamic Programming approach? Is it applicable for your problem? Essentially, recursion means you are solving the same problem over and over again but for slightly smaller input values. When running multiple iterations of the same recursive algorithm a program tends to often re-solve the same exact problem. Instead, a dynamic programming approach might store the solution in a cache and reference the cache before calculation the solution a second time.

Without knowing your exact problem, it is difficult to give an exact answer.

痞味浪人 2024-09-23 04:36:12

这取决于是什么让你的循环变慢。他们正在执行数据库查询吗?访问硬盘?否则会做一些导致他们等待外部 I/O 操作的事情吗?在这种情况下,最好的选择是开始添加线程,直到您不再看到它们的回报,您基本上必须对其进行调整。

如果它们的速度很慢,仅仅是因为 Java 内存中发生了原始处理,那么您可以尝试为计算机上的每个 CPU 核心添加一个线程,但除此之外可能看不到太多好处。

It depends on what's making your loops slow. Are they executing database queries? Accessing a hard disk? Otherwise doing something that causes them to wait around for an external I/O operation? In that case, your best bet is to start adding threads until you stop seeing returns on them, you basically have to tune it.

If they're slow simply because of raw processing taking place in memory in Java, then you could try adding a thread per CPU core on your machine, but probably won't see much benefit beyond that.

眼眸里的那抹悲凉 2024-09-23 04:36:12

如果您的“源”数据没有被您的算法修改,或者每次迭代仅对其自己的数据进行操作(不会更改附近的数据),那么它可以在双核上为您带来(最大)2 倍的速度提升。

这不是一个很好的解决方案,随着当前解决方案时间的增长,这将以 1/2 的速度增长,如果您处于双嵌套循环中,这仍然相当昂贵。 O(X^2/2) 仍然几乎等于 O(X^2)。

如果你能找到一种方法来优化你的算法,这种方法具有更高的真正成功的潜力,并且不太可能花费大量的时间来修复错误。

If your "Source" data isn't modified by your algorithm or each iteration only operates on it's own data (doesn't change nearby data) it could give you a (max) 2x speed boost on a dual-core.

That's not a great solution and as the time grows for your current solution, this will grow at 1/2 which is still quite expensive if you are in a double-nested loop. O(X^2/2) still pretty much equals O(X^2).

If you can find a way to optimize your algorithm that has a much higher potential of real success and much less of a potential for an amazing amount of time sunk into bug fixing.

带刺的爱情 2024-09-23 04:36:12

X轴上的每个点可以并行计算吗?如果是这样,这就是您通过多线程提高性能的机会。

Fork-Join 框架将使这种程序变得简单。您可以获得早期访问版本,或者在某种程度上自己模仿。

Can each point along the X-axis be computed in parallel? If so, that's your opportunity for performance improvements via multi-threading.

The Fork-Join framework will make this kind of program simple. You can get an early access version, or imitate it yourself to some extent.

鲸落 2024-09-23 04:36:12

“一个循环套在另一个循环中”应该敲响警钟;)。外循环总是等待内循环。所以这是一个串行执行,使用线程不会有任何改变。除非每个循环将结果写入中央服务,该服务可以通过 x 轴上的后续事件(循环)进行查阅。所以服务将是有状态的......

"a loop inside another loop" that should ring bells ;). The outer loop is always waiting for the inner one. So this is a serial execution and using threads wont change a bit. Unless each loop writes the result to a central service which can be consulted by a following event (loop) on your x-axis. So the service would be stateful...

带上头具痛哭 2024-09-23 04:36:12

我不认为并发可以使你的应用程序更快。它可以帮助您在应用程序中运行多个任务,但不会使它们更快。

我的经验:
尝试摆脱递归调用,这是使应用程序更快的最佳方法。

i don't think concurrency can make your application faster. it can help you to run multiple task in your application, but it won't make them faster.

my experience:
try to get rid of recursive calls, that's the best way to make the app faster.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文