如何检查嵌套列表中的列表项是否存在于集合中?
我有一个语料库中每个句子的嵌套列表。该集合是所有出现多次的单词。我如何检查列表中的每个单词是否在仅包含出现一次的单词的集合中? 然后我需要用 str UNK 替换所有出现多次的单词。
我试过:
for sent in tokenized_sents:
for word in sent:
if word in set:
word = '<UNK>'
I have a nested list of every sentence from a corpus. The set is all the words that occur more than once. How would I check if each word within the list is in the set containing only words that occur once?
I then need to replace all words that occur more than once with the str UNK.
I tried:
for sent in tokenized_sents:
for word in sent:
if word in set:
word = '<UNK>'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以创建一个字典,使用
collections.Counter
跟踪语料库中每个单词的出现次数You can create a dictionary which keeps tracks of the number of occurrences of each word in your corpus with
collections.Counter