蟒蛇 &多处理，将集合生成分解为子流程

发布于 2024-10-04 10:36:58 字数 1481 浏览 4 评论 0原文

我必须根据其他字符串的一些计算生成一组字符串。这需要相当长的时间，而且我正在开发多处理器/多核服务器，所以我认为我可以将这些任务分解成块并将它们传递给不同的进程。

首先，我将第一个字符串列表分成每个 10000 个块，将其发送到创建新集合的进程，然后尝试获取锁定并将这些报告回主进程。但是，我的主进程集是空的！

这是一些代码：

def build_feature_labels(self,strings,return_obj,l):
    feature_labels = set()
    for s in strings:
        feature_labels = feature_labels.union(s.get_feature_labels())
    print "method: ", len(feature_labels)
    l.acquire()
    return_obj.return_feature_labels(feature_labels)
    l.release()
    print "Thread Done"

def return_feature_labels(self,labs):
    self.feature_labels = self.feature_labels.union(labs)
    print "length self", len(self.feature_labels)
    print "length labs", len(labs)


current_pos = 0
lock = multiprocessing.Lock()

while current_pos < len(orig_strings):
    while len(multiprocessing.active_children()) > threads:
        print "WHILE: cpu count", str(multiprocessing.cpu_count())
            T.sleep(30)

    print "number of processes", str(len(multiprocessing.active_children()))
    proc = multiprocessing.Process(target=self.build_feature_labels,args=(orig_strings[current_pos:current_pos+self.MAX_ITEMS],self,lock))
    proc.start()
    current_pos = current_pos + self.MAX_ITEMS

    while len(multiprocessing.active_children()) > 0:
        T.sleep(3)


    print len(self.feature_labels)

奇怪的是 a) 主进程上的 self.feature_labels 是空的，但是当从每个子进程调用它时，它有项目。我认为我在这里采取了错误的方法（这就是我过去在 Java 中的做法！）。有更好的方法吗？

提前致谢。

原文

I've got to generate a set of strings based on some calculations of other strings. This takes quite a while, and I'm working on a multiprocessor/multicore server so I figured that I could break these tasks down into chunks and pass them off to different process.

Firstly I break the first list of strings down into chunks of 10000 each, send this off to a process which creates a new set, then try to obtain a lock and report these back to the master process. However, my master processes's set is empty!

Here's some code:

def build_feature_labels(self,strings,return_obj,l):
    feature_labels = set()
    for s in strings:
        feature_labels = feature_labels.union(s.get_feature_labels())
    print "method: ", len(feature_labels)
    l.acquire()
    return_obj.return_feature_labels(feature_labels)
    l.release()
    print "Thread Done"

def return_feature_labels(self,labs):
    self.feature_labels = self.feature_labels.union(labs)
    print "length self", len(self.feature_labels)
    print "length labs", len(labs)


current_pos = 0
lock = multiprocessing.Lock()

while current_pos < len(orig_strings):
    while len(multiprocessing.active_children()) > threads:
        print "WHILE: cpu count", str(multiprocessing.cpu_count())
            T.sleep(30)

    print "number of processes", str(len(multiprocessing.active_children()))
    proc = multiprocessing.Process(target=self.build_feature_labels,args=(orig_strings[current_pos:current_pos+self.MAX_ITEMS],self,lock))
    proc.start()
    current_pos = current_pos + self.MAX_ITEMS

    while len(multiprocessing.active_children()) > 0:
        T.sleep(3)


    print len(self.feature_labels)

What is strange is a) that self.feature_labels on the master process is empty, but when it is called from each sub-process it has items. I think I'm taking the wrong approach here (it's how I used to do it in Java!). Is there a better approach?

Thanks in advance.

分享到QQ

分享到微博