我已经使用HuggingFace Transformers库的微调变压器模型生成了一堆单词嵌入式。
现在,想进行一些快速评估,结果是否有任何好处。我偶然发现了Gensim,发现它具有方便的功能,例如 model.wv.most_similar()
,可能还有一些我可以使用该行的其他功能。
我想知道是否没有加载Gensim模型,而是可以以某种方式将嵌入式表导入其中并使用它,因此我不必自己实现所有这些功能。
我的嵌入目前在词典中带有键,项目对是单词及其嵌入向量,尽管我可以以其他任何格式合理地保存它。
I have generated a bunch of word embeddings using a fine-tuned transformer model from the huggingface transformers library.
Now would like do some quick evaluation, whether the results are any good. I stumbled upon gensim and saw that it had handy functions like for example model.wv.most_similar()
and probably a few others that I could use down the line.
I was wondering if instead of loading a gensim model, I could somehow import my embedding table into it and have it use that instead, so I don't have to implement all of those functions on my own.
My embeddings are currently in a dictionary with the key, item pair being the word and its embedding vector, though I could reasonably save it in any other format.
发布评论
评论(1)
做了一些挖掘并找到了这篇文章:
使用方法:
它
需要一些小的修改,但似乎可以很好地工作我的用例。
Did some digging and found this article: https://www.kaggle.com/code/matsuik/convert-embedding-dictionary-to-gensim-w2v-format/notebook
With the method:
and
It needed some small modification but seems to work just fine for my use case.