TensorFlow 词嵌入模型 + LDA 传递给 LatentDirichletAllocation.fit 的数据中的负值
我正在尝试使用 TensorFlow hub 中预训练的模型而不是频率在将生成的特征向量传递给 LDA 模型之前,使用向量化技术进行词嵌入。
我按照 TensorFlow 模型的步骤进行操作,但在将生成的特征向量传递给 LDA 模型时出现此错误:
Negative values in data passed to LatentDirichletAllocation.fit
这是我到目前为止所实现的:
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow_hub as hub
from sklearn.decomposition import LatentDirichletAllocation
embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim50-with-normalization/1")
embeddings = embed(["cat is on the mat", "dog is in the fog"])
lda_model = LatentDirichletAllocation(n_components=2, max_iter=50)
lda = lda_model.fit_transform(embeddings)
我意识到 print(embeddings)
打印一些负数值如下图所示:
tf.Tensor(
[[ 0.16589954 0.0254965 0.1574857 0.17688066 0.02911299 -0.03092718
0.19445257 -0.05709129 -0.08631689 -0.04391516 0.13032274 0.10905275
-0.08515751 0.01056632 -0.17220995 -0.17925954 0.19556305 0.0802278
-0.03247919 -0.49176937 -0.07767699 -0.03160921 -0.13952136 0.05959712
0.06858718 0.22386682 -0.16653948 0.19412343 -0.05491862 0.10997339
-0.15811177 -0.02576607 -0.07910853 -0.258499 -0.04206644 -0.20052543
0.1705603 -0.15314153 0.0039225 -0.28694248 0.02468278 0.11069503
0.03733957 0.01433943 -0.11048374 0.11931834 -0.11552787 -0.11110869
0.02384969 -0.07074881]
但是,有没有办法解决这个问题呢?
I am trying to use a pre-trained model from TensorFlow hub instead of frequency vectorization techniques for word embedding before passing the resultant feature vector to the LDA model.
I followed the steps for the TensorFlow model, but I got this error upon passing the resultant feature vector to the LDA model:
Negative values in data passed to LatentDirichletAllocation.fit
Here's what I have implemented so far:
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow_hub as hub
from sklearn.decomposition import LatentDirichletAllocation
embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim50-with-normalization/1")
embeddings = embed(["cat is on the mat", "dog is in the fog"])
lda_model = LatentDirichletAllocation(n_components=2, max_iter=50)
lda = lda_model.fit_transform(embeddings)
I realized that print(embeddings)
prints some negative values as shown below:
tf.Tensor(
[[ 0.16589954 0.0254965 0.1574857 0.17688066 0.02911299 -0.03092718
0.19445257 -0.05709129 -0.08631689 -0.04391516 0.13032274 0.10905275
-0.08515751 0.01056632 -0.17220995 -0.17925954 0.19556305 0.0802278
-0.03247919 -0.49176937 -0.07767699 -0.03160921 -0.13952136 0.05959712
0.06858718 0.22386682 -0.16653948 0.19412343 -0.05491862 0.10997339
-0.15811177 -0.02576607 -0.07910853 -0.258499 -0.04206644 -0.20052543
0.1705603 -0.15314153 0.0039225 -0.28694248 0.02468278 0.11069503
0.03733957 0.01433943 -0.11048374 0.11931834 -0.11552787 -0.11110869
0.02384969 -0.07074881]
But, is there a solution to this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于
LatentDirichletAllocation
的fit
函数不允许使用负数组,因此我建议您应用 softplus 在嵌入
。这是代码片段:
As the
fit
function ofLatentDirichletAllocation
does not allow a negative array, I will recommend you to apply softplus on theembeddings
.Here is the code snippet: