TensorFlow:在模型拟合期间确定自定义损耗功能中批处理大小的问题(批次大小“无”)

发布于 2025-02-01 01:52:46 字数 3070 浏览 2 评论 0 原文

我正在尝试创建一个自定义损失功能,其中我必须多次切片张量。下面列出了一个示例:

# Since different nodes need different activations, I decided to just do it like this
def activations(y_true, y_pred):
    n = y_true.shape[1]
    means = tf.slice(y_pred, begin=[0,0], size=[y_pred.shape[0], n])
    stdevs = tf.slice(y_pred, begin=[0,n], size=[y_pred.shape[0], n])
    corrs = tf.slice(y_pred, begin=[0,2*n], size=[y_pred.shape[0], y_pred.shape[1]-2*n])
    stdevs = keras.activations.softplus(stdevs)
    corrs = keras.activations.tanh(corrs)

此(以及整个损失函数)在自制剂张量y_true和y_pred上手动测试时工作正常,但是当在损失函数中使用它时,模型拟合时会出现错误(编译良好)。

    File <filename>, line 105, in activations  *
        means = tf.slice(y_pred, begin=[0,0], size=[y_true.shape[0], n])

    TypeError: Expected int32 passed to parameter 'size' of op 'Slice', got [None, 3] of type 'list' instead. Error: Expected int32, but got None of type 'NoneType'.

因此,显然,它在损失层内执行时无法确定批处理大小。

我该如何解决?

(注意:我不仅仅是在寻找解决此特定代码的解决方案,因为我将张力张量切成很多。我正在寻找切片的一般解决方案)。

我试图看 this ,我通过 this 帖子。编写自定义生成器以使批处理大小静态确实是这样做的唯一方法吗?

提前致谢?

编辑: 这是代码的(非常)简化的版本,可以触发错误。

import numpy as np
import numpy.random as npr

import keras
from keras import layers

import tensorflow as tf

# Since different nodes need different activations, I decided to just do it like this
def dummy_loss_func(y_true, y_pred):
    n = y_true.shape[1]
    means = tf.slice(y_pred, begin=[0,0], size=[y_pred.shape[0], n])
    stdevs = tf.slice(y_pred, begin=[0,n], size=[y_pred.shape[0], n]) #I'm assuming these are all (0, infty)
    corrs = tf.slice(y_pred, begin=[0,2*n], size=[y_pred.shape[0], y_pred.shape[1]-2*n])
    
    stdevs = keras.activations.softplus(stdevs)
    corrs = keras.activations.tanh(corrs)
    
    relErrors = tf.math.square(means - y_true)/stdevs
    return tf.reduce_mean(tf.math.square(relErrors))

def dummy_model(dim):
    model = keras.Sequential(
    [
        keras.Input(shape=(1)),
        layers.Dense(2*dim + int(round(dim * (dim-1)/2)), kernel_initializer = tf.keras.initializers.GlorotUniform()),
    ]
    )
    model.summary()
    model.compile(loss=dummy_loss_func, optimizer="adam")
    return model

#Generating some fake data
n = 5000
dim = 3
pts = npr.uniform(size=[n, 2*dim + int(round(dim * (dim-1)/2))])
dummy_in = np.zeros(n)
print(dummy_in.size)
print(pts.size)

#Comping the model goes fine
model = dummy_model(dim)

# Model exucution will go fine
print(model.predict([0]))

# Just calling the loss function also works
print(dummy_loss_func(tf.constant([[3., 2., 1.],[1., 2., 3.]]), tf.constant([[2., 1., 1., 5., 3., 2., 3., 2., 1.], [2., 5., 1., 1., 3., 6., 3., 4., 1.]])))

# The error only comes here
model.fit(dummy_in, pts, verbose=1)

I'm trying to create a custom loss function, in which I have to slice the tensors multiple times. One example is listed below:

# Since different nodes need different activations, I decided to just do it like this
def activations(y_true, y_pred):
    n = y_true.shape[1]
    means = tf.slice(y_pred, begin=[0,0], size=[y_pred.shape[0], n])
    stdevs = tf.slice(y_pred, begin=[0,n], size=[y_pred.shape[0], n])
    corrs = tf.slice(y_pred, begin=[0,2*n], size=[y_pred.shape[0], y_pred.shape[1]-2*n])
    stdevs = keras.activations.softplus(stdevs)
    corrs = keras.activations.tanh(corrs)

This (and the entire loss function) works fine when testing it manually on selfmade Tensors y_true and y_pred, but when using it inside a loss function it will give an error upon model fitting (compiling goes fine).

    File <filename>, line 105, in activations  *
        means = tf.slice(y_pred, begin=[0,0], size=[y_true.shape[0], n])

    TypeError: Expected int32 passed to parameter 'size' of op 'Slice', got [None, 3] of type 'list' instead. Error: Expected int32, but got None of type 'NoneType'.

So apparently, it can't determine the batch size when executed inside a loss layer.

How do I solve this?

(note: I'm not looking for a solution to this specific code only, since I'm slicing my tensors quite a lot. I'm looking for a general solution to slicing).

I tried to look at this and this and I read through this post. Is writing a custom generator to make the batch size static really the only way to do this?

Thanks in advance?

EDIT:
Here's a (hugely) simplified version of the code, that triggers the error.

import numpy as np
import numpy.random as npr

import keras
from keras import layers

import tensorflow as tf

# Since different nodes need different activations, I decided to just do it like this
def dummy_loss_func(y_true, y_pred):
    n = y_true.shape[1]
    means = tf.slice(y_pred, begin=[0,0], size=[y_pred.shape[0], n])
    stdevs = tf.slice(y_pred, begin=[0,n], size=[y_pred.shape[0], n]) #I'm assuming these are all (0, infty)
    corrs = tf.slice(y_pred, begin=[0,2*n], size=[y_pred.shape[0], y_pred.shape[1]-2*n])
    
    stdevs = keras.activations.softplus(stdevs)
    corrs = keras.activations.tanh(corrs)
    
    relErrors = tf.math.square(means - y_true)/stdevs
    return tf.reduce_mean(tf.math.square(relErrors))

def dummy_model(dim):
    model = keras.Sequential(
    [
        keras.Input(shape=(1)),
        layers.Dense(2*dim + int(round(dim * (dim-1)/2)), kernel_initializer = tf.keras.initializers.GlorotUniform()),
    ]
    )
    model.summary()
    model.compile(loss=dummy_loss_func, optimizer="adam")
    return model

#Generating some fake data
n = 5000
dim = 3
pts = npr.uniform(size=[n, 2*dim + int(round(dim * (dim-1)/2))])
dummy_in = np.zeros(n)
print(dummy_in.size)
print(pts.size)

#Comping the model goes fine
model = dummy_model(dim)

# Model exucution will go fine
print(model.predict([0]))

# Just calling the loss function also works
print(dummy_loss_func(tf.constant([[3., 2., 1.],[1., 2., 3.]]), tf.constant([[2., 1., 1., 5., 3., 2., 3., 2., 1.], [2., 5., 1., 1., 3., 6., 3., 4., 1.]])))

# The error only comes here
model.fit(dummy_in, pts, verbose=1)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

凑诗 2025-02-08 01:52:46

让我们一起解决这个问题。我们俩都可能需要来回编辑内容。

我将解决您问题的切片部分,因为鉴于该信息,这是最可行的。

让我们实例化形状的张量[3,3,3]:

y = tf.constant([ [[1, 2, 3]   , [4, 5, 6   ], [7, 8, 9   ]],                                                                                                          
                  [[10, 11, 12], [13, 14, 15], [16, 17, 18]],                                                                                                 
                  [[19, 20, 21], [22, 23, 24], [25, 26, 27]] ]) 

请注意,这是 1 形状的张量[3,3,3]。让我们对其进行可视化:

[ins] In [50]: y[0]                                                                                                                                                         
Out[50]:                                                                                                                                                                    
<tf.Tensor: shape=(3, 3), dtype=int32, numpy=                                                                                                                               
array([[1, 2, 3],                                                                                                                                                           
       [4, 5, 6],                                                                                                                                                           
       [7, 8, 9]], dtype=int32)>                                                                                                                                            
                                                                                                                                                                            
[ins] In [51]: y[1]                                                                                                                                                         
Out[51]:                                                                                                                                                                    
<tf.Tensor: shape=(3, 3), dtype=int32, numpy=                                                                                                                               
array([[10, 11, 12],                                                                                                                                                        
       [13, 14, 15],                                                                                                                                                        
       [16, 17, 18]], dtype=int32)>                                                                                                                                         
                                                                                                                                                                            
[ins] In [52]: y[2]                                                                                                                                                         
Out[52]:                                                                                                                                                                    
<tf.Tensor: shape=(3, 3), dtype=int32, numpy=                                                                                                                               
array([[19, 20, 21],                                                                                                                                                        
       [22, 23, 24],                                                                                                                                                        
       [25, 26, 27]], dtype=int32)>                       

就轴而言,我们可以想象包含3个3x3矩阵的最左轴,我们在上面使用 y [0] y [1] y [0] y [0] 和 y [2] 。现在,让我们雕刻这个数字。

[nav] In [53]: tf.slice(y, begin=[0, 0, 0], size=[2, 2, 2])                                                                                                                 
Out[53]:                                                                                                                                                                    
<tf.Tensor: shape=(2, 2, 2), dtype=int32, numpy=                                                                                                                            
array([[[ 1,  2],                                                                                                                                                           
        [ 4,  5]],                                                                                                                                                          
                                                                                                                                                                            
       [[10, 11],                                                                                                                                                           
        [13, 14]]], dtype=int32)>                                                                                                                                           
                                            

这里发生的是我们要从较大的立方体要求一个较小的立方,0] 。因此,我们将要进行三个较大的立方体进行三个切割:首先,我们将进入“计算机”轴两个步骤,因此从最深的层中没有任何东西都不会出现(数字 [19,20,21 21 ],[22,23,24],[25,26,27] in Shape [3,3] )。然后,我们将进行水平切割,这意味着 [7,8,9],[16,17,18] 出现的数字都不27] 已经在最后一个切割中被砍掉了。最后,我们将垂直切开2步,可确保 [3,6],[12,15] 不显示。因此,我们在第一个章中输掉了九个数字,我们将在第二章中输掉9个数字,但三个与第一个切碎重叠,所以我们只输了六个。第三章,我们会输掉九个,但是我们从第一个章中输掉了三个,第二章是从第二章中丢了两个(那是三个,但与第一个重叠),这留下了四个在最后一个章中丢失的东西。 27 - (9 + 6 + 4)= 8 这就是我们得到的。

要处理的关键之一是问一个问题:我在这里有批次,还是我正在处理的批次中的一个观察结果。你怎么知道?最左侧的轴是批处理轴,通常表示为 none ,这意味着批处理数量可变。让我们进行一批张量,您可以使用上述张量进行以下操作:

[ins] In [57]: tf.reshape(y, shape=(-1, 3, 3, 3))                                                                                                                           
Out[57]:                                                                                                                                                                    
<tf.Tensor: shape=(1, 3, 3, 3), dtype=int32, numpy=                                                                                                                         
array([[[[ 1,  2,  3],                                                                                                                                                      
         [ 4,  5,  6],                                                                                                                                                      
         [ 7,  8,  9]],                                                                                                                                                     
                                                                                                                                                                            
        [[10, 11, 12],                                                                                                                                                      
         [13, 14, 15],                                                                                                                                                      
         [16, 17, 18]],                                                                                                                                                     
                                                                                                                                                                            
        [[19, 20, 21],                                                                                                                                                      
         [22, 23, 24],                                                                                                                                                      
         [25, 26, 27]]]], dtype=int32)>                                                                                                                                     
                                                                                                                                                                            
[ins] In [58]: tf.reshape(y, shape=(-1, 3, 3, 3)).shape                                                                                                                     
Out[58]: TensorShape([1, 3, 3, 3])                 

上面的说法是重塑我的数据,以便我有一个3x3x3 Cube,但我也希望在左边的一些东西,又名批次,轴。由于有27个数字,因此仅“加深”尺寸。这可以通过在上面的输出中添加另一对 [] s来看出这一点。毕竟,它不能为我们制造数字,因为这是我们的观察结果。您也可以使用 tf.expand_dims ,但是我发现 tf.Reshape 更直观。

现在,我们有一批尺寸1,每个观察值是形状的立方体[3,3,3] ,可以将其分配给 y_pred (如果您愿意)。尝试通过您的功能运行批处理,并查看其工作原理。我发现对处理形状问题的超级有用的另一件事是,在 ipdb ipython 中使用嵌入模式。您可以设置断点并进入有问题的线并观察和修复。祝你好运!

解决方案(与任何基本领域知识有关。显然是张量是域的不可知::))

pts_tensor = tf.constant(pts)                                                                                                                                               
dummy_in_tensor = tf.constant(tf.reshape(dummy_in, (-1,1)))                                                                                                                 
my_ds = tf.data.Dataset.from_tensor_slices((dummy_in_tensor, pts_tensor))                                                                                                   
model.fit(my_ds, verbose=1) 

我认为问题是批处理轴。要做得更好,我需要了解域,但是我有一些学习要做:)

let's work through this together. Likely both of us will need to edit things back and forth.

I'm going to address the slice part of your question, since that was the most tractable given the information.

Let's instantiate a tensor of shape [3, 3, 3]:

y = tf.constant([ [[1, 2, 3]   , [4, 5, 6   ], [7, 8, 9   ]],                                                                                                          
                  [[10, 11, 12], [13, 14, 15], [16, 17, 18]],                                                                                                 
                  [[19, 20, 21], [22, 23, 24], [25, 26, 27]] ]) 

Notice that this is 1 tensor of shape [3, 3, 3]. Let's visualize it:

[ins] In [50]: y[0]                                                                                                                                                         
Out[50]:                                                                                                                                                                    
<tf.Tensor: shape=(3, 3), dtype=int32, numpy=                                                                                                                               
array([[1, 2, 3],                                                                                                                                                           
       [4, 5, 6],                                                                                                                                                           
       [7, 8, 9]], dtype=int32)>                                                                                                                                            
                                                                                                                                                                            
[ins] In [51]: y[1]                                                                                                                                                         
Out[51]:                                                                                                                                                                    
<tf.Tensor: shape=(3, 3), dtype=int32, numpy=                                                                                                                               
array([[10, 11, 12],                                                                                                                                                        
       [13, 14, 15],                                                                                                                                                        
       [16, 17, 18]], dtype=int32)>                                                                                                                                         
                                                                                                                                                                            
[ins] In [52]: y[2]                                                                                                                                                         
Out[52]:                                                                                                                                                                    
<tf.Tensor: shape=(3, 3), dtype=int32, numpy=                                                                                                                               
array([[19, 20, 21],                                                                                                                                                        
       [22, 23, 24],                                                                                                                                                        
       [25, 26, 27]], dtype=int32)>                       

In terms of axes, we can imagine the left-most axis containing 3 3x3 matrices, which we referenced above using y[0], y[1], and y[2]. Now let's carve this cube of numbers.

[nav] In [53]: tf.slice(y, begin=[0, 0, 0], size=[2, 2, 2])                                                                                                                 
Out[53]:                                                                                                                                                                    
<tf.Tensor: shape=(2, 2, 2), dtype=int32, numpy=                                                                                                                            
array([[[ 1,  2],                                                                                                                                                           
        [ 4,  5]],                                                                                                                                                          
                                                                                                                                                                            
       [[10, 11],                                                                                                                                                           
        [13, 14]]], dtype=int32)>                                                                                                                                           
                                            

What's happening here is we're asking for a smaller cube from the bigger cube, specifically of shape [2, 2, 2] and we want it to start from the point [0, 0, 0]. So we are going to make three cuts to that bigger cube: first we're going to go into the "computer" axis two steps, so nothing from the deepest layer there should show up (numbers [19, 20, 21],[22, 23, 24],[25, 26, 27] in shape [3, 3]). Then we are going to make a horizontal cut, which means none of the numbers from [7, 8, 9],[16, 17, 18] show up, [25, 26, 27] was already chopped off in the last cut. Lastly, we make a vertical cut 2 steps from the origin, ensuring [3, 6],[12,15] don't show up. So we lose nine numbers in the first chop, we would've lost nine in the second, but three overlapped with the first chop, so we only lost six. The third chop, we would've lost nine, but we lost three from the first chop, two from the second chop (would've been three, but one overlaps with the first), which leaves four that were lost in the last chop. 27 - (9 + 6 + 4) = 8 which is what we got.

One of the key things to work on is to ask the question: do I have a batch here, or is it one observation that's in the batch that I'm handling. How can you tell? The left-most axis is the batch axis, and it's generally represented as None, that means there's a variable number of batches. Let's make a batch of the tensor we have, which you can do with the above tensor as following:

[ins] In [57]: tf.reshape(y, shape=(-1, 3, 3, 3))                                                                                                                           
Out[57]:                                                                                                                                                                    
<tf.Tensor: shape=(1, 3, 3, 3), dtype=int32, numpy=                                                                                                                         
array([[[[ 1,  2,  3],                                                                                                                                                      
         [ 4,  5,  6],                                                                                                                                                      
         [ 7,  8,  9]],                                                                                                                                                     
                                                                                                                                                                            
        [[10, 11, 12],                                                                                                                                                      
         [13, 14, 15],                                                                                                                                                      
         [16, 17, 18]],                                                                                                                                                     
                                                                                                                                                                            
        [[19, 20, 21],                                                                                                                                                      
         [22, 23, 24],                                                                                                                                                      
         [25, 26, 27]]]], dtype=int32)>                                                                                                                                     
                                                                                                                                                                            
[ins] In [58]: tf.reshape(y, shape=(-1, 3, 3, 3)).shape                                                                                                                     
Out[58]: TensorShape([1, 3, 3, 3])                 

What the above is saying is that reshape my data so that I have a 3x3x3 cube, but I also want something in the left-most, aka batch, axis. Since there's 27 numbers, it just "deepens" the dimensions. This can be seen by the addition of another pair of [ ]s in the output above. It can't manufacture numbers for us after all as these are our observations. You can also use tf.expand_dims but I find tf.reshape to be more intuitive.

Now we have a batch of size 1, where each observation is a cube of shape [3, 3, 3] which can be assigned to y_pred if you like. Try and run the batch through your functions and see how it works. Another thing that I have found super helpful with dealing with issues of shape is using ipdb and embed mode in ipython. You can set breakpoints and get into the offending lines and observe and fix. Best of luck!

Solution (w/o any fundamental domain knowledge. Apparently tensors are domain agnostic :) )

pts_tensor = tf.constant(pts)                                                                                                                                               
dummy_in_tensor = tf.constant(tf.reshape(dummy_in, (-1,1)))                                                                                                                 
my_ds = tf.data.Dataset.from_tensor_slices((dummy_in_tensor, pts_tensor))                                                                                                   
model.fit(my_ds, verbose=1) 

I think the issue was with the batch axis. To do any better, I'd need to understand the domain, but I got some studying to do :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文