Is using the embeddings as input of the new layers enough for the transfer learning?
This should work as expected. Of course, you should consider that your generalization capability might be lower than expected for unseen data points (when comparing with data points seen during training of the pre-trained model). Usually, when using a pre-trained model, every data point is unseen for the original network, but in your case some of the data points might have been used for training, so their performance might be "unrealistically too high" when compared with data that your pre-trained model has never seen.
Is it okay to split the embeddings to train, test and validate dataset?
This is a good approach to solve the problem from the previous point. If you don't know which data points were used for training, you could benefit from using cross-validation and create multiple splits to reduce the impact of this issue.
发布评论
评论(1)
这应该按预期工作。当然,您应该考虑到对于未见过的数据点,您的泛化能力可能低于预期(与预训练模型训练期间看到的数据点进行比较时)。通常,当使用预训练模型时,原始网络中的每个数据点都是看不见的,但在您的情况下,某些数据点可能已用于训练,因此与数据相比,它们的性能可能“不切实际地太高”您的预训练模型从未见过。
这是解决上一点问题的好方法。如果您不知道哪些数据点用于训练,您可以受益于使用交叉验证并创建多个分割来减少此问题的影响。
This should work as expected. Of course, you should consider that your generalization capability might be lower than expected for unseen data points (when comparing with data points seen during training of the pre-trained model). Usually, when using a pre-trained model, every data point is unseen for the original network, but in your case some of the data points might have been used for training, so their performance might be "unrealistically too high" when compared with data that your pre-trained model has never seen.
This is a good approach to solve the problem from the previous point. If you don't know which data points were used for training, you could benefit from using cross-validation and create multiple splits to reduce the impact of this issue.