xarray.open_raterio() img 文件和多处理池的问题

发布于 2025-01-09 23:47:31 字数 1171 浏览 0 评论 0原文

我正在尝试使用 mutliprocessing Pool.map() 来加速我的代码。在为每个进程进行计算的函数中,我引用了使用 xarray.open_rasterio() 打开的 xarray.DataArray 。但是,我收到与此类似的错误:

rasterio.errors.RasterioIOError:读取或写入失败。 /net/home_stu/cfite/data/CDL/2019/2019_30m_cdls.img,带 1:IReadBlock 在 X 偏移 190、Y 偏移 115 处失败:无法打开外部数据文件:/net/home_stu/cfite/data/CDL/2019 /

我认为这是与另一个工作人员也打开同一文件时同时引用该文件有关的问题?我使用 DataArray.sel() 来选择我使用的栅格网格的一小部分,因为整个 .img 文件太大而无法一次加载。我尝试在主代码中打开 .img 文件,然后在我的函数中引用它,并且我尝试在传递给 Pool 的函数中打开/关闭它.map() - 无论如何都会收到这样的错误。我的文件是否已损坏,或者我无法使用多处理池处理该文件?我对多处理工作非常陌生,所以任何建议都会受到赞赏。这是我的代码的示例:

import pandas as pd
import xarray as xr
import numpy as np
from multiprocessing import Pool

def select_grid(x,y):
    ds = xr.open_rasterio('myrasterfile.img') #opening large file with xarray
    grid = ds.sel(x=slice(x,x+50), y=slice(y,y+50))
    ds.close()
    return grid

def myfunction(row):
    x = row.x
    y = row.y
    mygrid = select_grid(x,y)
    my_calculation = mygrid.sum() #example calculation, but really I am doing multiple calculations
    my_calculation.to_csv('filename.csv')

with Pool(30) as p: 
    p.map(myfunction, list_of_df_rows)

I am trying to use mutliprocessing Pool.map() to speed up my code. In the function where I have computation occurring for each process I reference an xarray.DataArray that was opened using xarray.open_rasterio(). However, I receive errors similar to this:

rasterio.errors.RasterioIOError: Read or write failed. /net/home_stu/cfite/data/CDL/2019/2019_30m_cdls.img, band 1: IReadBlock failed at X offset 190, Y offset 115: Unable to open external data file: /net/home_stu/cfite/data/CDL/2019/

I assume this is some issue related to the same file being referenced simultaneously while another worker is opening it too? I use DataArray.sel() to select small portions of the raster grid that I work with since the entire .img file is way to big to load all at once. I have tried opening the .img file in the main code and then just referencing to it in my function, and I've tried opening/closing it in the function that is being passed to Pool.map() - and receive errors like this regardless. Is my file corrupted, or will I just not be able to work with this file using multiprocessing Pool? I am very new to working with multiprocessing, so any advice is appreciated. Here is an example of my code:

import pandas as pd
import xarray as xr
import numpy as np
from multiprocessing import Pool

def select_grid(x,y):
    ds = xr.open_rasterio('myrasterfile.img') #opening large file with xarray
    grid = ds.sel(x=slice(x,x+50), y=slice(y,y+50))
    ds.close()
    return grid

def myfunction(row):
    x = row.x
    y = row.y
    mygrid = select_grid(x,y)
    my_calculation = mygrid.sum() #example calculation, but really I am doing multiple calculations
    my_calculation.to_csv('filename.csv')

with Pool(30) as p: 
    p.map(myfunction, list_of_df_rows)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文