如果我不提供 oov_token,tensorflow 中的 Tokenizer 如何处理词汇表之外的标记?
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_train)
encoded_docs = tokenizer.texts_to_sequences(X_train)
padded_sequence = pad_sequences(encoded_docs, maxlen=60)
test_tweets = tokenizer.texts_to_sequences(X_test)
test_padded_sequence = pad_sequences(test_tweets, maxlen=60)
即使我没有提供 oov_token
参数,我也没有收到该代码的任何错误。我预计在 test_tweets = tokenizer.texts_to_sequences(X_test)
中出现错误
,当您不提供 oov_token
时,tensorflow 如何在测试期间处理词汇表之外的单词?
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_train)
encoded_docs = tokenizer.texts_to_sequences(X_train)
padded_sequence = pad_sequences(encoded_docs, maxlen=60)
test_tweets = tokenizer.texts_to_sequences(X_test)
test_padded_sequence = pad_sequences(test_tweets, maxlen=60)
I didn't get any error with that code even though I didn't provide oov_token
argument. I expected to get an error in test_tweets = tokenizer.texts_to_sequences(X_test)
How does tensorflow deal with out of vocabulary words during the test time when you don't provide the oov_token
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果
oov_token
为None
,OOV 单词将默认被忽略/丢弃:OOV words will be ignored / discarded by default, if
oov_token
isNone
: