将字符串列表传递给map_async()

发布于 2024-11-16 20:53:06 字数 1529 浏览 3 评论 0原文

我在 map_async 方面遇到了一个有趣的问题,我无法弄清楚。

我正在使用带有进程池的 python 多处理库。我正在尝试传递一个要比较的字符串列表以及一个要与使用 map_async() 的函数进行比较的字符串列表,

现在我有:

from multiprocessing import Pool, cpu_count
import functools

dictionary = /a/file/on/my/disk
passin = /another/file/on/my/disk

num_proc = cpu_count()

dictionary = readFiletoList(fdict)
dictionary = sortByLength(dictionary)

words = readFiletoList(passin, 'WINDOWS-1252')
words = sortByLength(words)

result = pool.map_async(functools.partial(mpmine, dictionary=dictionary), [words], 1000)

def readFiletoList(fname, fencode='utf-8'):
  linelist = list()
  with open(fname, encoding=fencode) as f:
    for line in f:
      linelist.append(line.strip())
  return linelist


def sortByLength(words):
  '''Takes an ordered iterable and sorts it based on word length'''
  return sorted(words, key=len)

def mpmine(word, dictionary):
  '''Takes a tuple of length 2 with it's arguments.

  At least dictionary needs to be sorted by word length. If not, whacky results ensue.
  '''
  results = dict()
  for pw in word:
    pwlen = len(pw)
    pwres = list()
    for word in dictionary:
      if len(word) > pwlen:
        break
      if word in pw:
        pwres.append(word)
    if len(pwres) > 0:
      results[pw] = pwres
  return results



if __name__ == '__main__':
  main()

字典和单词都是字符串列表。这导致只使用一个进程而不是我设置的数量。如果我去掉变量“words”中的方括号,它似乎会依次迭代每个字符串的字符并导致混乱。

我希望发生的是,从单词中取出 1000 个字符串并将它们传递到工作进程中,然后获取结果,因为这是一个可笑的并行任务。

编辑:添加更多代码以使发生的事情更加清晰。

I'm having a funny issue with map_async that i can't figure out.

I'm using python's multiprocessing library with process pools. I'm trying to pass a list of strings to compare against and a list of strings to be compared to a function using map_async()

right now i have:

from multiprocessing import Pool, cpu_count
import functools

dictionary = /a/file/on/my/disk
passin = /another/file/on/my/disk

num_proc = cpu_count()

dictionary = readFiletoList(fdict)
dictionary = sortByLength(dictionary)

words = readFiletoList(passin, 'WINDOWS-1252')
words = sortByLength(words)

result = pool.map_async(functools.partial(mpmine, dictionary=dictionary), [words], 1000)

def readFiletoList(fname, fencode='utf-8'):
  linelist = list()
  with open(fname, encoding=fencode) as f:
    for line in f:
      linelist.append(line.strip())
  return linelist


def sortByLength(words):
  '''Takes an ordered iterable and sorts it based on word length'''
  return sorted(words, key=len)

def mpmine(word, dictionary):
  '''Takes a tuple of length 2 with it's arguments.

  At least dictionary needs to be sorted by word length. If not, whacky results ensue.
  '''
  results = dict()
  for pw in word:
    pwlen = len(pw)
    pwres = list()
    for word in dictionary:
      if len(word) > pwlen:
        break
      if word in pw:
        pwres.append(word)
    if len(pwres) > 0:
      results[pw] = pwres
  return results



if __name__ == '__main__':
  main()

Both dictionary and words are lists of strings. This results in only one process being used instead of the amount I have set. If i take the square brackets off the variable 'words' it seems to iterate through each string's characters in turn and cause a mess.

What i would like to have happen is it take like 1000 strings out of words and pass them into the worker process and then get the results, because this is a ridiculously parallelisable task.

EDIT: Added more code to make what's going on more clear.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

椵侞 2024-11-23 20:53:06

好吧,其实这个问题是我自己想出来的。我只会在这里为可能遇到同样问题的其他人发布答案。我遇到问题的原因是因为 map_async 从列表中获取一项(在本例中是一个字符串),并将其输入到函数中,该函数需要一个字符串列表。所以它基本上将每个字符串视为字符列表。 mpmine 的更正代码是:

def mpmine(word, dictionary):
  '''Takes a tuple of length 2 with it's arguments.

  At least dictionary needs to be sorted by word length. If not, whacky results ensue.
  '''
  results = dict()
  pw = word
  pwlen = len(pw)
  pwres = list()
  for word in dictionary:
    if len(word) > pwlen:
      break
    if word in pw:
      pwres.append(word)
  if len(pwres) > 0:
    results[pw] = pwres
  return results

我希望这可以帮助其他面临类似问题的人。

Ok, i actually figured this one out myself. I'm only going to post the answer here for anyone else who might come along and have the same issue. The reason i was having problems was because map_async takes one item from the list (in this case a string), and feeds it into the function, which was expecting a list of strings. so it then was treating each string as a list of chars basically. the corrected code for mpmine is:

def mpmine(word, dictionary):
  '''Takes a tuple of length 2 with it's arguments.

  At least dictionary needs to be sorted by word length. If not, whacky results ensue.
  '''
  results = dict()
  pw = word
  pwlen = len(pw)
  pwres = list()
  for word in dictionary:
    if len(word) > pwlen:
      break
    if word in pw:
      pwres.append(word)
  if len(pwres) > 0:
    results[pw] = pwres
  return results

I hope this helps anyone else facing a similar issue.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文