有关TensorFlow Keras层的一些问题
我是NLP的新生。当我在Github中浏览某人的代码时,我遇到了一些问题。因此,代码是关于NL2SQL的。作者像这样处理的数据集。
Output token_ids:
[101, 753, 7439, 671, 736, 2399, 5018, 1724, 1453, 1920, 7942, 6044, 1469, 2166, 2147, 6845, 4495, 6821, 697, 6956, 2512, 4275, 4638, 4873, 2791, 2600, 1304, 3683, 3221, 1914, 2208, 1435, 102, 11, 2512, 4275, 1399, 4917, 102, 12, 1453, 4873, 2791, 102, 12, 4873, 2791, 1304, 3683, 102, 12, 1767, 1772, 782, 3613, 102]
Output segment_ids:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Output header_ids:
[33 39 44 50]
他在问题之后添加了专栏。然后将整个句子发送到Bert进行编码。处理后,输入数据集看起来像这样。批处理大小为2。的示例。
nput_token_ids : shape(2, 57)
[[ 101 753 7439 671 736 2399 5018 1724 1453 1920 7942 6044 1469 2166
2147 6845 4495 6821 697 6956 2512 4275 4638 4873 2791 2600 1304 3683
3221 1914 2208 1435 102 12 1767 1772 782 3613 102 12 4873 2791
1304 3683 102 12 1453 4873 2791 102 11 2512 4275 1399 4917 102
0]
[ 101 872 1962 8024 872 4761 6887 791 2399 5018 1724 1453 2166 2147
6845 4495 8024 6820 3300 6929 6956 1920 7942 6044 2124 812 4873 2791
2600 4638 1304 3683 1408 102 12 1767 1772 782 3613 102 12 4873
2791 1304 3683 102 12 1453 4873 2791 102 11 2512 4275 1399 4917
102]]
input_segment_ids : shape(2, 57)
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
input_header_ids : shape(2, 4)
[[33 39 45 50]
[34 40 46 51]]
input_header_mask : shape(2, 4)
[[1 1 1 1]
[1 1 1 1]]
output_sel_agg : shape(2, 4, 1)
[[[6]
[5]
[6]
[6]]
[[6]
[5]
[6]
[6]]]
output_cond_conn_op : shape(2, 1)
[[2]
[2]]
output_cond_op : shape(2, 4, 1)
[[[4]
[4]
[4]
[2]]
[[4]
[4]
[4]
[2]]]
在预测部分中,他通过他预处理的ID提取了嵌入向量的列,然后进行乘,然后将它们发送到密集层。模型结构看起来像这样。
def seq_gather(x):
seq, idxs = x
idxs = K.cast(idxs, 'int32')
return tf.gather_nd(seq, idxs)
bert_model = load_trained_model_from_checkpoint(paths.config, paths.checkpoint, seq_len=None)
for l in bert_model.layers:
l.trainable = True
inp_token_ids = Input(shape=(None,), name='input_token_ids', dtype='int32')
inp_segment_ids = Input(shape=(None,), name='input_segment_ids', dtype='int32')
inp_header_ids = Input(shape=(None,), name='input_header_ids', dtype='int32')
inp_header_mask = Input(shape=(None, ), name='input_header_mask')
x = bert_model([inp_token_ids, inp_segment_ids]) # (None, seq_len, 768)
# predict cond_conn_op
x_for_cond_conn_op = Lambda(lambda x: x[:, 0])(x) # (None, 768)
p_cond_conn_op = Dense(num_cond_conn_op, activation='softmax', name='output_cond_conn_op')(x_for_cond_conn_op)
# predict sel_agg
x_for_header = Lambda(seq_gather)([x, inp_header_ids]) # (None, h_len, 768)
header_mask = Lambda(lambda x: K.expand_dims(x, axis=-1))(inp_header_mask) # (None, h_len, 1)
#x_for_header = tf.keras.layers.Multiply()([x_for_header,header_mask])
#x_for_header = Masking()(x_for_header)
p_sel_agg = Dense(num_sel_agg, activation='softmax', name='output_sel_agg')(x_for_header)
x_for_cond_op = Concatenate(axis=-1)([x_for_header, p_sel_agg])
p_cond_op = Dense(num_cond_op, activation='softmax', name='output_cond_op')(x_for_cond_op)
model = Model(
[inp_token_ids, inp_segment_ids, inp_header_ids, inp_header_mask],
[p_cond_conn_op, p_sel_agg, p_cond_op]
)
但是,当我运行代码时,它会引起错误,这意味着进行乘法需要特定的输入形状,并且列的数量在不同的数据中变化,当我删除该行并继续,以检查结果是否可以相同没有掩盖,它提出了另一个错误,即应定义最后一个尺寸到密集层。
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-30-b0a3d0700bdf> in <module>()
10 #x_for_header = Masking()(x_for_header)
11
---> 12 p_sel_agg = Dense(num_sel_agg, activation='softmax', name='output_sel_agg')(x_for_header)
13
14 x_for_cond_op = Concatenate(axis=-1)([x_for_header, p_sel_agg])
1 frames
/usr/local/lib/python3.7/dist-packages/keras/layers/core/dense.py in build(self, input_shape)
137 last_dim = tf.compat.dimension_value(input_shape[-1])
138 if last_dim is None:
--> 139 raise ValueError('The last dimension of the inputs to a Dense layer '
140 'should be defined. Found None. '
141 f'Full input shape received: {input_shape}')
ValueError: The last dimension of the inputs to a Dense layer should be defined. Found None. Full input shape received: <unknown>
所以我想知道代码中的某个地方是否错误,也许是收集功能?还是仅仅是因为Keras的版本,因为该代码于2019年被拉到Github。是否还有其他一些方法可以达到此要求?
提前致谢。
I am a freshman in NLP. I encountered some problems when I was going through someone's code in github. So the code is about NL2SQL.The author processed dataset like this.
Output token_ids:
[101, 753, 7439, 671, 736, 2399, 5018, 1724, 1453, 1920, 7942, 6044, 1469, 2166, 2147, 6845, 4495, 6821, 697, 6956, 2512, 4275, 4638, 4873, 2791, 2600, 1304, 3683, 3221, 1914, 2208, 1435, 102, 11, 2512, 4275, 1399, 4917, 102, 12, 1453, 4873, 2791, 102, 12, 4873, 2791, 1304, 3683, 102, 12, 1767, 1772, 782, 3613, 102]
Output segment_ids:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Output header_ids:
[33 39 44 50]
He added the columns after the questions. Then sent the whole sentences into bert to be encoded. And after processing, the input dataset looks like this. There is an example of batch size of 2.
nput_token_ids : shape(2, 57)
[[ 101 753 7439 671 736 2399 5018 1724 1453 1920 7942 6044 1469 2166
2147 6845 4495 6821 697 6956 2512 4275 4638 4873 2791 2600 1304 3683
3221 1914 2208 1435 102 12 1767 1772 782 3613 102 12 4873 2791
1304 3683 102 12 1453 4873 2791 102 11 2512 4275 1399 4917 102
0]
[ 101 872 1962 8024 872 4761 6887 791 2399 5018 1724 1453 2166 2147
6845 4495 8024 6820 3300 6929 6956 1920 7942 6044 2124 812 4873 2791
2600 4638 1304 3683 1408 102 12 1767 1772 782 3613 102 12 4873
2791 1304 3683 102 12 1453 4873 2791 102 11 2512 4275 1399 4917
102]]
input_segment_ids : shape(2, 57)
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
input_header_ids : shape(2, 4)
[[33 39 45 50]
[34 40 46 51]]
input_header_mask : shape(2, 4)
[[1 1 1 1]
[1 1 1 1]]
output_sel_agg : shape(2, 4, 1)
[[[6]
[5]
[6]
[6]]
[[6]
[5]
[6]
[6]]]
output_cond_conn_op : shape(2, 1)
[[2]
[2]]
output_cond_op : shape(2, 4, 1)
[[[4]
[4]
[4]
[2]]
[[4]
[4]
[4]
[2]]]
And in the prediction part, he extracted the the columns embedding vectors through the ids he preprocessed, and doing multiply then sent them into the Dense layer. The Model structure looks like this.
def seq_gather(x):
seq, idxs = x
idxs = K.cast(idxs, 'int32')
return tf.gather_nd(seq, idxs)
bert_model = load_trained_model_from_checkpoint(paths.config, paths.checkpoint, seq_len=None)
for l in bert_model.layers:
l.trainable = True
inp_token_ids = Input(shape=(None,), name='input_token_ids', dtype='int32')
inp_segment_ids = Input(shape=(None,), name='input_segment_ids', dtype='int32')
inp_header_ids = Input(shape=(None,), name='input_header_ids', dtype='int32')
inp_header_mask = Input(shape=(None, ), name='input_header_mask')
x = bert_model([inp_token_ids, inp_segment_ids]) # (None, seq_len, 768)
# predict cond_conn_op
x_for_cond_conn_op = Lambda(lambda x: x[:, 0])(x) # (None, 768)
p_cond_conn_op = Dense(num_cond_conn_op, activation='softmax', name='output_cond_conn_op')(x_for_cond_conn_op)
# predict sel_agg
x_for_header = Lambda(seq_gather)([x, inp_header_ids]) # (None, h_len, 768)
header_mask = Lambda(lambda x: K.expand_dims(x, axis=-1))(inp_header_mask) # (None, h_len, 1)
#x_for_header = tf.keras.layers.Multiply()([x_for_header,header_mask])
#x_for_header = Masking()(x_for_header)
p_sel_agg = Dense(num_sel_agg, activation='softmax', name='output_sel_agg')(x_for_header)
x_for_cond_op = Concatenate(axis=-1)([x_for_header, p_sel_agg])
p_cond_op = Dense(num_cond_op, activation='softmax', name='output_cond_op')(x_for_cond_op)
model = Model(
[inp_token_ids, inp_segment_ids, inp_header_ids, inp_header_mask],
[p_cond_conn_op, p_sel_agg, p_cond_op]
)
But when I ran the code, it raised error which means that doing the multiply needs the specific Input Shape, and the number of columns varies in different data, when I removed that line and continued, to check if the results could be the same even without the masking, it raised another error that the last dimension to the dense layer should be defined.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-30-b0a3d0700bdf> in <module>()
10 #x_for_header = Masking()(x_for_header)
11
---> 12 p_sel_agg = Dense(num_sel_agg, activation='softmax', name='output_sel_agg')(x_for_header)
13
14 x_for_cond_op = Concatenate(axis=-1)([x_for_header, p_sel_agg])
1 frames
/usr/local/lib/python3.7/dist-packages/keras/layers/core/dense.py in build(self, input_shape)
137 last_dim = tf.compat.dimension_value(input_shape[-1])
138 if last_dim is None:
--> 139 raise ValueError('The last dimension of the inputs to a Dense layer '
140 'should be defined. Found None. '
141 f'Full input shape received: {input_shape}')
ValueError: The last dimension of the inputs to a Dense layer should be defined. Found None. Full input shape received: <unknown>
So I wonder if somewhere in the code is wrong, maybe the gather function? or just because of the version of keras since this code was pull to github in 2019. And is there some other ways to achieve this requirements?
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论