有关TensorFlow Keras层的一些问题

发布于 2025-01-23 05:58:18 字数 4827 浏览 0 评论 0原文

我是NLP的新生。当我在Github中浏览某人的代码时，我遇到了一些问题。因此，代码是关于NL2SQL的。作者像这样处理的数据集。

Output token_ids:
[101, 753, 7439, 671, 736, 2399, 5018, 1724, 1453, 1920, 7942, 6044, 1469, 2166, 2147, 6845, 4495, 6821, 697, 6956, 2512, 4275, 4638, 4873, 2791, 2600, 1304, 3683, 3221, 1914, 2208, 1435, 102, 11, 2512, 4275, 1399, 4917, 102, 12, 1453, 4873, 2791, 102, 12, 4873, 2791, 1304, 3683, 102, 12, 1767, 1772, 782, 3613, 102]
Output segment_ids:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Output header_ids:
[33 39 44 50]

他在问题之后添加了专栏。然后将整个句子发送到Bert进行编码。处理后，输入数据集看起来像这样。批处理大小为2。的示例。

nput_token_ids : shape(2, 57)
[[ 101  753 7439  671  736 2399 5018 1724 1453 1920 7942 6044 1469 2166
  2147 6845 4495 6821  697 6956 2512 4275 4638 4873 2791 2600 1304 3683
  3221 1914 2208 1435  102   12 1767 1772  782 3613  102   12 4873 2791
  1304 3683  102   12 1453 4873 2791  102   11 2512 4275 1399 4917  102
     0]
 [ 101  872 1962 8024  872 4761 6887  791 2399 5018 1724 1453 2166 2147
  6845 4495 8024 6820 3300 6929 6956 1920 7942 6044 2124  812 4873 2791
  2600 4638 1304 3683 1408  102   12 1767 1772  782 3613  102   12 4873
  2791 1304 3683  102   12 1453 4873 2791  102   11 2512 4275 1399 4917
   102]]
input_segment_ids : shape(2, 57)
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
input_header_ids : shape(2, 4)
[[33 39 45 50]
 [34 40 46 51]]
input_header_mask : shape(2, 4)
[[1 1 1 1]
 [1 1 1 1]]
output_sel_agg : shape(2, 4, 1)
[[[6]
  [5]
  [6]
  [6]]

 [[6]
  [5]
  [6]
  [6]]]
output_cond_conn_op : shape(2, 1)
[[2]
 [2]]
output_cond_op : shape(2, 4, 1)
[[[4]
  [4]
  [4]
  [2]]

 [[4]
  [4]
  [4]
  [2]]]

在预测部分中，他通过他预处理的ID提取了嵌入向量的列，然后进行乘，然后将它们发送到密集层。模型结构看起来像这样。

def seq_gather(x):
    seq, idxs = x
    idxs = K.cast(idxs, 'int32')
    return tf.gather_nd(seq, idxs)

bert_model = load_trained_model_from_checkpoint(paths.config, paths.checkpoint, seq_len=None)
for l in bert_model.layers:
    l.trainable = True
    
inp_token_ids = Input(shape=(None,), name='input_token_ids', dtype='int32')
inp_segment_ids = Input(shape=(None,), name='input_segment_ids', dtype='int32')
inp_header_ids = Input(shape=(None,), name='input_header_ids', dtype='int32')
inp_header_mask = Input(shape=(None, ), name='input_header_mask')

x = bert_model([inp_token_ids, inp_segment_ids]) # (None, seq_len, 768)

# predict cond_conn_op
x_for_cond_conn_op = Lambda(lambda x: x[:, 0])(x) # (None, 768)
p_cond_conn_op = Dense(num_cond_conn_op, activation='softmax', name='output_cond_conn_op')(x_for_cond_conn_op)

# predict sel_agg
x_for_header = Lambda(seq_gather)([x, inp_header_ids]) # (None, h_len, 768)
header_mask = Lambda(lambda x: K.expand_dims(x, axis=-1))(inp_header_mask) # (None, h_len, 1)

#x_for_header = tf.keras.layers.Multiply()([x_for_header,header_mask])
#x_for_header = Masking()(x_for_header)

p_sel_agg = Dense(num_sel_agg, activation='softmax', name='output_sel_agg')(x_for_header)

x_for_cond_op = Concatenate(axis=-1)([x_for_header, p_sel_agg])

p_cond_op = Dense(num_cond_op, activation='softmax', name='output_cond_op')(x_for_cond_op)


model = Model(
    [inp_token_ids, inp_segment_ids, inp_header_ids, inp_header_mask],
    [p_cond_conn_op, p_sel_agg, p_cond_op]
)

但是，当我运行代码时，它会引起错误，这意味着进行乘法需要特定的输入形状，并且列的数量在不同的数据中变化，当我删除该行并继续，以检查结果是否可以相同没有掩盖，它提出了另一个错误，即应定义最后一个尺寸到密集层。

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-b0a3d0700bdf> in <module>()
     10 #x_for_header = Masking()(x_for_header)
     11 
---> 12 p_sel_agg = Dense(num_sel_agg, activation='softmax', name='output_sel_agg')(x_for_header)
     13 
     14 x_for_cond_op = Concatenate(axis=-1)([x_for_header, p_sel_agg])

1 frames
/usr/local/lib/python3.7/dist-packages/keras/layers/core/dense.py in build(self, input_shape)
    137     last_dim = tf.compat.dimension_value(input_shape[-1])
    138     if last_dim is None:
--> 139       raise ValueError('The last dimension of the inputs to a Dense layer '
    140                        'should be defined. Found None. '
    141                        f'Full input shape received: {input_shape}')

ValueError: The last dimension of the inputs to a Dense layer should be defined. Found None. Full input shape received: <unknown>

所以我想知道代码中的某个地方是否错误，也许是收集功能？还是仅仅是因为Keras的版本，因为该代码于2019年被拉到Github。是否还有其他一些方法可以达到此要求？

提前致谢。

原文

I am a freshman in NLP. I encountered some problems when I was going through someone's code in github. So the code is about NL2SQL.The author processed dataset like this.

Output token_ids:
[101, 753, 7439, 671, 736, 2399, 5018, 1724, 1453, 1920, 7942, 6044, 1469, 2166, 2147, 6845, 4495, 6821, 697, 6956, 2512, 4275, 4638, 4873, 2791, 2600, 1304, 3683, 3221, 1914, 2208, 1435, 102, 11, 2512, 4275, 1399, 4917, 102, 12, 1453, 4873, 2791, 102, 12, 4873, 2791, 1304, 3683, 102, 12, 1767, 1772, 782, 3613, 102]
Output segment_ids:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Output header_ids:
[33 39 44 50]

He added the columns after the questions. Then sent the whole sentences into bert to be encoded. And after processing, the input dataset looks like this. There is an example of batch size of 2.

nput_token_ids : shape(2, 57)
[[ 101  753 7439  671  736 2399 5018 1724 1453 1920 7942 6044 1469 2166
  2147 6845 4495 6821  697 6956 2512 4275 4638 4873 2791 2600 1304 3683
  3221 1914 2208 1435  102   12 1767 1772  782 3613  102   12 4873 2791
  1304 3683  102   12 1453 4873 2791  102   11 2512 4275 1399 4917  102
     0]
 [ 101  872 1962 8024  872 4761 6887  791 2399 5018 1724 1453 2166 2147
  6845 4495 8024 6820 3300 6929 6956 1920 7942 6044 2124  812 4873 2791
  2600 4638 1304 3683 1408  102   12 1767 1772  782 3613  102   12 4873
  2791 1304 3683  102   12 1453 4873 2791  102   11 2512 4275 1399 4917
   102]]
input_segment_ids : shape(2, 57)
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
input_header_ids : shape(2, 4)
[[33 39 45 50]
 [34 40 46 51]]
input_header_mask : shape(2, 4)
[[1 1 1 1]
 [1 1 1 1]]
output_sel_agg : shape(2, 4, 1)
[[[6]
  [5]
  [6]
  [6]]

 [[6]
  [5]
  [6]
  [6]]]
output_cond_conn_op : shape(2, 1)
[[2]
 [2]]
output_cond_op : shape(2, 4, 1)
[[[4]
  [4]
  [4]
  [2]]

 [[4]
  [4]
  [4]
  [2]]]

And in the prediction part, he extracted the the columns embedding vectors through the ids he preprocessed, and doing multiply then sent them into the Dense layer. The Model structure looks like this.

def seq_gather(x):
    seq, idxs = x
    idxs = K.cast(idxs, 'int32')
    return tf.gather_nd(seq, idxs)

bert_model = load_trained_model_from_checkpoint(paths.config, paths.checkpoint, seq_len=None)
for l in bert_model.layers:
    l.trainable = True
    
inp_token_ids = Input(shape=(None,), name='input_token_ids', dtype='int32')
inp_segment_ids = Input(shape=(None,), name='input_segment_ids', dtype='int32')
inp_header_ids = Input(shape=(None,), name='input_header_ids', dtype='int32')
inp_header_mask = Input(shape=(None, ), name='input_header_mask')

x = bert_model([inp_token_ids, inp_segment_ids]) # (None, seq_len, 768)

# predict cond_conn_op
x_for_cond_conn_op = Lambda(lambda x: x[:, 0])(x) # (None, 768)
p_cond_conn_op = Dense(num_cond_conn_op, activation='softmax', name='output_cond_conn_op')(x_for_cond_conn_op)

# predict sel_agg
x_for_header = Lambda(seq_gather)([x, inp_header_ids]) # (None, h_len, 768)
header_mask = Lambda(lambda x: K.expand_dims(x, axis=-1))(inp_header_mask) # (None, h_len, 1)

#x_for_header = tf.keras.layers.Multiply()([x_for_header,header_mask])
#x_for_header = Masking()(x_for_header)

p_sel_agg = Dense(num_sel_agg, activation='softmax', name='output_sel_agg')(x_for_header)

x_for_cond_op = Concatenate(axis=-1)([x_for_header, p_sel_agg])

p_cond_op = Dense(num_cond_op, activation='softmax', name='output_cond_op')(x_for_cond_op)


model = Model(
    [inp_token_ids, inp_segment_ids, inp_header_ids, inp_header_mask],
    [p_cond_conn_op, p_sel_agg, p_cond_op]
)

But when I ran the code, it raised error which means that doing the multiply needs the specific Input Shape, and the number of columns varies in different data, when I removed that line and continued, to check if the results could be the same even without the masking, it raised another error that the last dimension to the dense layer should be defined.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-b0a3d0700bdf> in <module>()
     10 #x_for_header = Masking()(x_for_header)
     11 
---> 12 p_sel_agg = Dense(num_sel_agg, activation='softmax', name='output_sel_agg')(x_for_header)
     13 
     14 x_for_cond_op = Concatenate(axis=-1)([x_for_header, p_sel_agg])

1 frames
/usr/local/lib/python3.7/dist-packages/keras/layers/core/dense.py in build(self, input_shape)
    137     last_dim = tf.compat.dimension_value(input_shape[-1])
    138     if last_dim is None:
--> 139       raise ValueError('The last dimension of the inputs to a Dense layer '
    140                        'should be defined. Found None. '
    141                        f'Full input shape received: {input_shape}')

ValueError: The last dimension of the inputs to a Dense layer should be defined. Found None. Full input shape received: <unknown>

So I wonder if somewhere in the code is wrong, maybe the gather function? or just because of the version of keras since this code was pull to github in 2019. And is there some other ways to achieve this requirements?

Thanks in advance.

分享到QQ

分享到微博