在 numpy 数组中扩展一系列非均匀 netcdf 数据

发布于 2024-08-28 22:51:22 字数 1181 浏览 12 评论 0原文

我是 python 新手,如果已经有人问过这个问题,我深表歉意。

使用 python 和 numpy,我尝试通过迭代调用 append() 将许多 netcdf 文件中的数据收集到单个数组中。

天真地,我试图做这样的事情:

from numpy import *
from pupynere import netcdf_file

x = array([])
y = [...some list of files...]

for file in y:
    ncfile = netcdf_file(file,'r')
    xFragment = ncfile.variables["varname"][:]
    ncfile.close()
    x = append(x, xFragment)

我知道在正常情况下这是一个坏主意,因为它会在每个 append() 调用上重新分配新内存。但有两件事阻碍了 x 的预分配:

1)文件沿轴 0 的大小不一定相同(但沿后续轴的大小应该相同),因此我需要事先从每个文件中读取数组大小以预先计算最终值x 的大小。

然而...

2)据我所知,pupynere(和其他netcdf模块)在打开文件时将整个文件加载到内存中,而不仅仅是一个引用(例如其他环境中的许多netcdf模块)。因此,要预分配,我必须打开文件两次。

据我所知,有许多(>100)大(>1GB)文件,因此过度分配和重塑是不切实际的。

我的第一个问题是我是否缺少一些智能的预分配方法。

我的第二个问题更严重。上面的代码片段适用于一维数组。但如果我尝试加载矩阵,那么初始化就会成为问题。我可以将一维数组附加到空数组:

append( array([]), array([1, 2, 3]) )

但我不能将空数组附加到矩阵:

append( array([]), array([ [1, 2], [3, 4] ]), axis=0)

我相信像 x.extend(xFragment) 这样的东西会起作用,但我不认为 numpy 数组具有此功能。我还可以通过将第一个文件视为特殊情况来避免初始化问题,但如果有更好的方法,我宁愿避免这种情况。

如果有人可以提供帮助或建议,或者可以找出我的方法的问题,那么我将不胜感激。谢谢

I am new to python, apologies if this has been asked already.

Using python and numpy, I am trying to gather data across many netcdf files into a single array by iteratively calling append().

Naively, I am trying to do something like this:

from numpy import *
from pupynere import netcdf_file

x = array([])
y = [...some list of files...]

for file in y:
    ncfile = netcdf_file(file,'r')
    xFragment = ncfile.variables["varname"][:]
    ncfile.close()
    x = append(x, xFragment)

I know that under normal circumstances this is a bad idea, since it reallocates new memory on each append() call. But two things discourage preallocation of x:

1) The files are not necessarily the same size along axis 0 (but should be the same size along subsequent axes), so I would need to read the array sizes from each file beforehand to precalculate the final size of x.

However...

2) From what I can tell, pupynere (and other netcdf modules) load the entire file into memory upon opening the file, rather than just a reference (such as many netcdf modules in other enviroments). So to preallocate, I'd have to open the files twice.

There are many (>100) large (>1GB) files, so overallocating and reshaping is not practical, from what I can tell.

My first question is whether I am missing some intelligent way to preallocate.

My second question is more serious. The above snippet works for a single-dimension array. But if I try to load in a matrix, then initialisation becomes a problem. I can append a one-dimensional array to an empty array:

append( array([]), array([1, 2, 3]) )

but I cannot append an empty array to a matrix:

append( array([]), array([ [1, 2], [3, 4] ]), axis=0)

Something like x.extend(xFragment) would work, I believe, but I don't think numpy arrays have this functionality. I could also avoid the initialisation problem by treating the first file as a special case, but I'd prefer to avoid that if there's a better way to do it.

If anyone can offer help or a suggestion, or can identify a problem with my approach, then I'd be grateful. Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

虚拟世界 2024-09-04 22:51:22

您可以通过首先将文件中的数组加载到数组列表中,然后使用 连接 以连接所有数组。像这样的东西:

x = [] # a normal python list, not np.array
y = [...some list of files...]

for file in y:
    ncfile = netcdf_file(file,'r')
    xFragment = ncfile.variables["varname"][:]
    ncfile.close()
    x.append(xFragment)

combined_array = concatenate(x, axis=0)

You can solve the two problems by first loading the arrays from the files files into a list of arrays, and then using concatenate to join all the arrays. Something like this:

x = [] # a normal python list, not np.array
y = [...some list of files...]

for file in y:
    ncfile = netcdf_file(file,'r')
    xFragment = ncfile.variables["varname"][:]
    ncfile.close()
    x.append(xFragment)

combined_array = concatenate(x, axis=0)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文