尝试围绕numpy.__exceptions._arrayMemoryError问题在我的代码中

发布于 2025-01-25 22:56:09 字数 1475 浏览 1 评论 0原文

我有一个数据框 - ＆gt;具有形状的数据（10000,257）。我需要预处理此数据框架，以便可以在LSTM中使用它，该LSTM需要3维输入 - （Nrows，ntimesteps，nfeatures）我正在使用此处提供的代码段：

def univariate_processing(variable, window):
   import numpy as np

   # create empty 2D matrix from variable
   V = np.empty((len(variable)-window+1, window))

   # take each row/time window
   for i in range(V.shape[0]):
      V[i,:] = variable[i : i+window]

   V = V.astype(np.float32) # set common data type
   return V

def RNN_regprep(df, y, len_input, len_pred): #, test_size):
    # create 3D matrix for multivariate input
    X = np.empty((df.shape[0]-len_input+1, len_input, df.shape[1]))

    # Iterate univariate preprocessing on all variables - store them in XM
    for i in range(df.shape[1]):
        X[ : , : , i ] = univariate_processing(df[:,i], len_input)

    # create 2D matrix of y sequences
    y = y.reshape((-1,))  # reshape to 1D if needed
    Y = univariate_processing(y, len_pred)

    ## Trim dataframes as explained
    X = X[ :-(len_pred + 1) , : , : ]
    Y = Y[len_input:-1 , :]

    # Set common datatype
    X = X.astype(np.float32)
    Y = Y.astype(np.float32)

    return X, Y

X,y = RNN_regprep(data,label, len_ipnut=200,len_pred=1)

在运行以下错误时获得以下错误：

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 28.9 GiB for an array with shape (10000, 200, 257) and data type float64

I请了解，这是我在服务器中内存的问题。我想知道我可以在代码中更改的任何解决方案，以查看是否可以避免此内存错误或尝试减少此内存消耗？

原文

I have a data frame -> data with the shape (10000,257). I need to preprocess this dataframe so that I can use it in LSTM which requires a 3 dimensional input - (nrows,ntimesteps,nfeatures)I am working with the code snippet that is provided here:

def univariate_processing(variable, window):
   import numpy as np

   # create empty 2D matrix from variable
   V = np.empty((len(variable)-window+1, window))

   # take each row/time window
   for i in range(V.shape[0]):
      V[i,:] = variable[i : i+window]

   V = V.astype(np.float32) # set common data type
   return V

def RNN_regprep(df, y, len_input, len_pred): #, test_size):
    # create 3D matrix for multivariate input
    X = np.empty((df.shape[0]-len_input+1, len_input, df.shape[1]))

    # Iterate univariate preprocessing on all variables - store them in XM
    for i in range(df.shape[1]):
        X[ : , : , i ] = univariate_processing(df[:,i], len_input)

    # create 2D matrix of y sequences
    y = y.reshape((-1,))  # reshape to 1D if needed
    Y = univariate_processing(y, len_pred)

    ## Trim dataframes as explained
    X = X[ :-(len_pred + 1) , : , : ]
    Y = Y[len_input:-1 , :]

    # Set common datatype
    X = X.astype(np.float32)
    Y = Y.astype(np.float32)

    return X, Y

X,y = RNN_regprep(data,label, len_ipnut=200,len_pred=1)

While running this the following error is obtained:

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 28.9 GiB for an array with shape (10000, 200, 257) and data type float64

I do understand that this is more of an issue with my memory within my server. I want to know any solution that I can change within my code to see if I can avoid this memory error or try reducing this memory consumption?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

黄昏下泛黄的笔记 2025-02-01 22:56:09

这是窗口视图的目的。使用我的食谱在这里：

var = np.random.rand(10000,257)
w = window_nd(var, 200, axis = 0)

现在您有了var的窗口视图：

w.shape
Out[]: (9801, 200, 257)

但是，重要的是，它使用的数据与var完全相同，只是在窗口的窗口中查看它方法：

w.__array_interface__['data'] #This is the memory's starting address
Out[]: (1448954720320, False)

var.__array_interface__['data']
Out[]: (1448954720320, False)

np.shares_memory(var, w)
Out[]: True

w.base.base.base is var  #(lots of rearranging views in the background)
Out[]: True

您可以做：

def univariate_processing(variable, window):
   return window_nd(variable, window, axis = 0)

这应该大大减少您的内存分配，无需“魔术” :)

您也可以尝试

from skimage.util import view_as_windows
w = np.squeeze(view_as_windows(var, (200, 1)))

几乎可以做同一件事。在这种情况下：您的答案将是：

def univariate_processing(variable, window):
   from skimage.util import view_as_windows
   window = (window,) + (1,)*(len(variable.shape)-1)
   return np.squeeze(view_as_windows(variable, window))

This is what windowed views are for. Using my recipe here:

var = np.random.rand(10000,257)
w = window_nd(var, 200, axis = 0)

Now you have a windowed view over var:

w.shape
Out[]: (9801, 200, 257)

But, importantly, it's using the exact same data as var, just looking into it in a windowed way:

w.__array_interface__['data'] #This is the memory's starting address
Out[]: (1448954720320, False)

var.__array_interface__['data']
Out[]: (1448954720320, False)

np.shares_memory(var, w)
Out[]: True

w.base.base.base is var  #(lots of rearranging views in the background)
Out[]: True

So you can do:

def univariate_processing(variable, window):
   return window_nd(variable, window, axis = 0)

That should significantly reduce your memory allocation, no "magic" required :)

You can also try

from skimage.util import view_as_windows
w = np.squeeze(view_as_windows(var, (200, 1)))

Which does almost the same thing. In this case: your answer would be:

def univariate_processing(variable, window):
   from skimage.util import view_as_windows
   window = (window,) + (1,)*(len(variable.shape)-1)
   return np.squeeze(view_as_windows(variable, window))

回复收藏 0 原文

~没有更多了~