Jupyterlab 中的 Dask Array.compute() 峰值内存

发布于 2025-01-11 17:08:00 字数 684 浏览 0 评论 0原文

我正在分布式集群上使用 dask,当将结果返回到本地进程时,我注意到内存消耗达到峰值。

我的最小示例包括实例化集群并使用 dask.array.arange 创建一个约 1.6G 的简单数组。

我预计内存消耗约为数组大小,但我观察到内存峰值约为 3.2G。

Dask在计算过程中是否有任何副本?或者Jupyterlab需要制作一份副本吗?

import dask.array
import dask_jobqueue
import distributed

cluster_conf = {
    "cores": 1,
    "log_directory": "/work/scratch/chevrir/dask-workspace",
    "walltime": '06:00:00',
    "memory": "5GB"
}

cluster = dask_jobqueue.PBSCluster(**cluster_conf)
cluster.scale(n=1)
client = distributed.Client(cluster)
client

# 1.6 G in memory
a = dask.array.arange(2e8)

%load_ext memory_profiler
%memit a.compute()
# peak memory: 3219.02 MiB, increment: 3064.36 MiB

I am working with dask on a distributed cluster, and I noticed a peak memory consumption when getting the results back to the local process.

My minimal example consists in instanciating the cluster and creating a simple array of ~1.6G with dask.array.arange.

I expected the memory consumption to be around the array size, but I observed a memory peak around 3.2G.

Is there any copy done by Dask during the computation ? Or does Jupyterlab needs to make a copy ?

import dask.array
import dask_jobqueue
import distributed

cluster_conf = {
    "cores": 1,
    "log_directory": "/work/scratch/chevrir/dask-workspace",
    "walltime": '06:00:00',
    "memory": "5GB"
}

cluster = dask_jobqueue.PBSCluster(**cluster_conf)
cluster.scale(n=1)
client = distributed.Client(cluster)
client

# 1.6 G in memory
a = dask.array.arange(2e8)

%load_ext memory_profiler
%memit a.compute()
# peak memory: 3219.02 MiB, increment: 3064.36 MiB

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

妄司 2025-01-18 17:08:01

当你执行compute()时会发生什么:

  • 构建计算图(这很小)并将其发送到调度程序,
  • 调度程序让工作人员生成数组的各个部分,这应该是一个总数在工作人员上大约 1.6GB,
  • 客户端为您要求的输出构造一个空数组,知道其类型和大小,
  • 客户端通过网络或 IPC 从每个具有输出片段的工作人员接收字节串。这些复制到客户端的输出中,
  • 完整的数组返回给您

您可以看到,这里的倒数第二步必然需要复制数据。原始字节缓冲区最终可能会在稍后被垃圾收集。

What happens when you do compute():

  • the graph of your computation is constructued (this is small) and send to the scheduler
  • the scheduler gets workers to produce the pieces of the array, which should be a total of about 1.6GB on the workers
  • the client constructs an empty array for the output you are asking for, knowing its type and size
  • the client receives bunches of bytes across the network or IPC from each worker which has pieces of the output. These are copied into the output of the client
  • the complete array is returned to you

You can see that the penultimate step here necessarily requires duplication of data. The original bytes buffers may eventually be garbage collected later.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文