用Xarray -Compute()结果延迟的dask仍然延迟
我试图在两个数据集上使用Dask和Xarray进行一些分析(例如AVG),然后计算两个结果之间的差异。
这是我的代码,
cluster = LocalCluster(n_workers=5, threads_per_worker=3, **worker_kwargs)
def calc_avg(path):
mean = xr.open_mfdataset( path,combine='nested', concat_dim="time", parallel=True, decode_times=False, decode_cf=False)['var'].sel(lat=slice(south,north), lon=slice(west,east)).mean(dim='time')
return mean
def diff_(x,y):
return x-y
p1 = "/path/to/first/multi-file/dataset"
p2 = "/path/to/second/multi-file/dataset"
a = dask.delayed(calc_avg)(p1)
b = dask.delayed(calc_avg)(p2)
total = dask.delayed(diff_)(a,b)
result = total.compute()
此处的执行时间是17s。
但是,绘制结果(result.plot()
)需要超过1分钟,因此似乎在尝试绘制结果时实际上发生了计算。
这是使用DASK延迟的正确方法吗?
I tried to perform with Dask and xarray some analysis (e.g. avg) over two datasets, then compute a difference between the two results.
This is my code
cluster = LocalCluster(n_workers=5, threads_per_worker=3, **worker_kwargs)
def calc_avg(path):
mean = xr.open_mfdataset( path,combine='nested', concat_dim="time", parallel=True, decode_times=False, decode_cf=False)['var'].sel(lat=slice(south,north), lon=slice(west,east)).mean(dim='time')
return mean
def diff_(x,y):
return x-y
p1 = "/path/to/first/multi-file/dataset"
p2 = "/path/to/second/multi-file/dataset"
a = dask.delayed(calc_avg)(p1)
b = dask.delayed(calc_avg)(p2)
total = dask.delayed(diff_)(a,b)
result = total.compute()
The executiuon time here is 17s.
However, plotting the result (result.plot()
) takes more than 1 min, so it seems that the calculation actually happens when trying to plot the result.
Is this the proper way to use Dask delayed?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您正在打电话给 > ,它本身就是
延迟
函数中的Dask操作。因此,当您调用result.com pupute
时,您正在执行函数calc_avg
和平均值
。但是,calc_avg
返回DASK支持的数据。是的,17S任务将计划的延迟
calc_avg
和earne
的dask图将其定为计划dask.array
open_mfdataset
和Array OPS的DASK图。要解决此问题,请放下延迟的包装器,然后简单地使用
dask.Array
Xarray工作流程:请参阅 xarray指南与dask的并行计算指南进行简介。
You’re wrapping a call to
xr.open_mfdataset
, which is itself a dask operation, in adelayed
function. So when you callresult.compute
, you’re executing the functionscalc_avg
andmean
. However,calc_avg
returns a dask-backed DataArray. So yep, the 17s task converts the scheduleddelayed
dask graph ofcalc_avg
andmean
into a scheduleddask.array
dask graph ofopen_mfdataset
and array ops.To resolve this, drop the delayed wrappers and simply use the
dask.array
xarray workflow:See the xarray guide to parallel computing with dask for an introduction.