xarray.open_raterio() img 文件和多处理池的问题
我正在尝试使用 mutliprocessing Pool.map()
来加速我的代码。在为每个进程进行计算的函数中,我引用了使用 xarray.open_rasterio() 打开的 xarray.DataArray 。但是,我收到与此类似的错误:
rasterio.errors.RasterioIOError:读取或写入失败。 /net/home_stu/cfite/data/CDL/2019/2019_30m_cdls.img,带 1:IReadBlock 在 X 偏移 190、Y 偏移 115 处失败:无法打开外部数据文件:/net/home_stu/cfite/data/CDL/2019 /
我认为这是与另一个工作人员也打开同一文件时同时引用该文件有关的问题?我使用 DataArray.sel() 来选择我使用的栅格网格的一小部分,因为整个 .img 文件太大而无法一次加载。我尝试在主代码中打开 .img 文件,然后在我的函数中引用它,并且我尝试在传递给 Pool 的函数中打开/关闭它.map() - 无论如何都会收到这样的错误。我的文件是否已损坏,或者我无法使用多处理池处理该文件?我对多处理工作非常陌生,所以任何建议都会受到赞赏。这是我的代码的示例:
import pandas as pd
import xarray as xr
import numpy as np
from multiprocessing import Pool
def select_grid(x,y):
ds = xr.open_rasterio('myrasterfile.img') #opening large file with xarray
grid = ds.sel(x=slice(x,x+50), y=slice(y,y+50))
ds.close()
return grid
def myfunction(row):
x = row.x
y = row.y
mygrid = select_grid(x,y)
my_calculation = mygrid.sum() #example calculation, but really I am doing multiple calculations
my_calculation.to_csv('filename.csv')
with Pool(30) as p:
p.map(myfunction, list_of_df_rows)
I am trying to use mutliprocessing Pool.map()
to speed up my code. In the function where I have computation occurring for each process I reference an xarray.DataArray
that was opened using xarray.open_rasterio()
. However, I receive errors similar to this:
rasterio.errors.RasterioIOError: Read or write failed. /net/home_stu/cfite/data/CDL/2019/2019_30m_cdls.img, band 1: IReadBlock failed at X offset 190, Y offset 115: Unable to open external data file: /net/home_stu/cfite/data/CDL/2019/
I assume this is some issue related to the same file being referenced simultaneously while another worker is opening it too? I use DataArray.sel()
to select small portions of the raster grid that I work with since the entire .img
file is way to big to load all at once. I have tried opening the .img
file in the main code and then just referencing to it in my function, and I've tried opening/closing it in the function that is being passed to Pool.map()
- and receive errors like this regardless. Is my file corrupted, or will I just not be able to work with this file using multiprocessing Pool? I am very new to working with multiprocessing, so any advice is appreciated. Here is an example of my code:
import pandas as pd
import xarray as xr
import numpy as np
from multiprocessing import Pool
def select_grid(x,y):
ds = xr.open_rasterio('myrasterfile.img') #opening large file with xarray
grid = ds.sel(x=slice(x,x+50), y=slice(y,y+50))
ds.close()
return grid
def myfunction(row):
x = row.x
y = row.y
mygrid = select_grid(x,y)
my_calculation = mygrid.sum() #example calculation, but really I am doing multiple calculations
my_calculation.to_csv('filename.csv')
with Pool(30) as p:
p.map(myfunction, list_of_df_rows)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论