线程化快速创建大量图表

发布于 2024-10-01 05:18:16 字数 805 浏览 3 评论 0原文

我一直在尝试寻找使以下代码执行得更快的方法:

def do_chart(target="IMG_BACK", xlabel="xlabel", ylabel="ylabel", title="title",       ydata=pylab.arange(1961, 2031, 1)):
    global MYRAMDICT
    MYRAMDICT = {}
    print "here"
    for i in range(70):
        MYRAMDICT[i] = cStringIO.StringIO()
        xdata = pylab.arange(1961, 2031, 1)
        pylab.figure(num=None, figsize=(10.24, 5.12), dpi=1, facecolor='w', edgecolor='k')
        pylab.plot(xdata, ydata, linewidth=3.0)
        pylab.xlabel(xlabel); pylab.ylabel(ylabel); pylab.title(i)
        pylab.grid(True)
        pylab.savefig(MYRAMDICT[i], format='png')
        pylab.close()

此函数(请忽略 pylab 命令,它们在这里只是为了说明)创建一个字典(MYTAMDICT),我用 cString 对象填充该字典,用于存储内存图表。这些图表稍后会动态呈现给用户。

有人可以帮助我利用线程,以便我可以使用所有核心并使该功能执行得更快吗?或者给我指出改进的想法?

I have been trying to find ways to make the following piece of code perform faster:

def do_chart(target="IMG_BACK", xlabel="xlabel", ylabel="ylabel", title="title",       ydata=pylab.arange(1961, 2031, 1)):
    global MYRAMDICT
    MYRAMDICT = {}
    print "here"
    for i in range(70):
        MYRAMDICT[i] = cStringIO.StringIO()
        xdata = pylab.arange(1961, 2031, 1)
        pylab.figure(num=None, figsize=(10.24, 5.12), dpi=1, facecolor='w', edgecolor='k')
        pylab.plot(xdata, ydata, linewidth=3.0)
        pylab.xlabel(xlabel); pylab.ylabel(ylabel); pylab.title(i)
        pylab.grid(True)
        pylab.savefig(MYRAMDICT[i], format='png')
        pylab.close()

This function (please ignore the pylab commands, they are here just for illustration) creates a dictionary (MYTAMDICT) which i populated with cString objects that are used to store charts on memmory. These charts are later dynamically presented to the user.

Would somebody please help me to make use of threading so that I can use all of my cores and make this function perform faster? Or point me towards ideas to improve it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

旧情别恋 2024-10-08 05:18:16

对于描述,使用多处理比线程要好得多...你有一个“令人尴尬的并行”问题,并且没有磁盘 IO 限制(你正在写入内存)当然,在之间来回传递大的东西这个过程会变得很昂贵,但是返回一个表示 .png 的字符串应该不会太糟糕。

它可以很简单地完成:

import multiprocessing
import cStringIO

import matplotlib.pyplot as plt
import numpy as np

import itertools

def main():
    """Generates 1000 random plots and saves them as .png's in RAM"""
    pool = multiprocessing.Pool()
    same_title = itertools.repeat('Plot %i')
    fig_files = pool.map(plot, itertools.izip(xrange(1000), same_title))

def plot(args):
    """Make a random plot"""
    # Unfortunately, pool.map (and imap) only support a single argument to
    # the function, so you'll have to unpack a tuple of arguments...
    i, titlestring = args

    outfile = cStringIO.StringIO()

    x = np.cumsum(np.random.random(100) - 0.5)

    fig = plt.figure()
    plt.plot(x)
    fig.savefig(outfile, format='png', bbox_inches='tight')
    plt.title(titlestring % i)
    plt.close()

    # cStringIO files aren't pickelable, so we'll return the string instead...
    outfile.seek(0)
    return outfile.read()

main()

如果不使用多重处理,这在我的机器上大约需要 250 秒。对于多处理(8 核),大约需要 40 秒。

希望能有点帮助...

For the description, you'd be far better off using multiprocessing than threading... You have an "embarrassingly parallel" problem, and no disk IO constraints (you're writing to memory) Of course, passing large stuff back and forth between the processes will get expensive, but returning a string representing a .png shouldn't be too bad..

It can be done quite simply:

import multiprocessing
import cStringIO

import matplotlib.pyplot as plt
import numpy as np

import itertools

def main():
    """Generates 1000 random plots and saves them as .png's in RAM"""
    pool = multiprocessing.Pool()
    same_title = itertools.repeat('Plot %i')
    fig_files = pool.map(plot, itertools.izip(xrange(1000), same_title))

def plot(args):
    """Make a random plot"""
    # Unfortunately, pool.map (and imap) only support a single argument to
    # the function, so you'll have to unpack a tuple of arguments...
    i, titlestring = args

    outfile = cStringIO.StringIO()

    x = np.cumsum(np.random.random(100) - 0.5)

    fig = plt.figure()
    plt.plot(x)
    fig.savefig(outfile, format='png', bbox_inches='tight')
    plt.title(titlestring % i)
    plt.close()

    # cStringIO files aren't pickelable, so we'll return the string instead...
    outfile.seek(0)
    return outfile.read()

main()

Without using multiprocessing, this takes ~250 secs on my machine. With multiprocessing (8 cores), it takes ~40 secs.

Hope that helps a bit...

自在安然 2024-10-08 05:18:16

当且仅当 pylab 在执行时释放 gil 时,线程才会为您提供帮助。
而且,pylib 必须是线程安全的,并且您的代码必须以线程安全的方式使用它,但情况可能并不总是如此。

也就是说,如果您要使用线程,我认为这是作业队列的经典案例;因此,我会使用 队列对象,这已经足够好了照顾这个模式。

这是我通过干预您的代码和队列文档中给出的示例给出的一个示例。我什至没有彻底检查它,所以它会有错误;它更重要的是提供一个想法而不是其他任何东西。

# "Business" code
def do_chart(target="IMG_BACK", xlabel="xlabel", ylabel="ylabel", title="title",       ydata=pylab.arange(1961, 2031, 1)):
    global MYRAMDICT
    MYRAMDICT = {}
    print "here"
    for i in range(70):
      q.put(i)
    q.join()       # block until all tasks are done

def do_work(i):
    MYRAMDICT[i] = cStringIO.StringIO()
    xdata = pylab.arange(1961, 2031, 1)
    pylab.figure(num=None, figsize=(10.24, 5.12), dpi=1, facecolor='w', edgecolor='k')
    pylab.plot(xdata, ydata, linewidth=3.0)
    pylab.xlabel(xlabel); pylab.ylabel(ylabel); pylab.title(i)
    pylab.grid(True)
    pylab.savefig(MYRAMDICT[i], format='png')
    pylab.close()


# Handling the queue
def worker():
    while True:
        i = q.get()
        do_work(i)
        q.task_done()

q = Queue()
for i in range(num_worker_threads):
     t = Thread(target=worker)
     t.daemon = True
     t.start()

Threading will help you if and only if pylab is releasing the gil while executing.
Moreover, pylib must be thread-safe, and your code must use it in a thread-safe way, and this may not be always the case.

That said, if you are going to use threads, I think this is a classical case of job queue; therefore, I would use a queue object, that is nice enough to take care of this pattern.

Here is an example I have put out just by meddling with your code and the example given in the queue documentation. I did not even checked it thoroughly, so it WILL have bugs; it is more to give an idea than anything else.

# "Business" code
def do_chart(target="IMG_BACK", xlabel="xlabel", ylabel="ylabel", title="title",       ydata=pylab.arange(1961, 2031, 1)):
    global MYRAMDICT
    MYRAMDICT = {}
    print "here"
    for i in range(70):
      q.put(i)
    q.join()       # block until all tasks are done

def do_work(i):
    MYRAMDICT[i] = cStringIO.StringIO()
    xdata = pylab.arange(1961, 2031, 1)
    pylab.figure(num=None, figsize=(10.24, 5.12), dpi=1, facecolor='w', edgecolor='k')
    pylab.plot(xdata, ydata, linewidth=3.0)
    pylab.xlabel(xlabel); pylab.ylabel(ylabel); pylab.title(i)
    pylab.grid(True)
    pylab.savefig(MYRAMDICT[i], format='png')
    pylab.close()


# Handling the queue
def worker():
    while True:
        i = q.get()
        do_work(i)
        q.task_done()

q = Queue()
for i in range(num_worker_threads):
     t = Thread(target=worker)
     t.daemon = True
     t.start()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文