TensorFlow 词嵌入模型 + LDA 传递给 LatentDirichletAllocation.fit 的数据中的负值

发布于 2025-01-09 12:49:07 字数 1496 浏览 6 评论 0原文

我正在尝试使用 TensorFlow hub 中预训练的模型而不是频率在将生成的特征向量传递给 LDA 模型之前，使用向量化技术进行词嵌入。

我按照 TensorFlow 模型的步骤进行操作，但在将生成的特征向量传递给 LDA 模型时出现此错误：

Negative values in data passed to LatentDirichletAllocation.fit

这是我到目前为止所实现的：

import pandas as pd
import matplotlib.pyplot as plt
import tensorflow_hub as hub

from sklearn.decomposition import LatentDirichletAllocation

embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim50-with-normalization/1")
embeddings = embed(["cat is on the mat", "dog is in the fog"])
lda_model = LatentDirichletAllocation(n_components=2, max_iter=50)
lda = lda_model.fit_transform(embeddings)

我意识到 print(embeddings) 打印一些负数值如下图所示：

tf.Tensor(
[[ 0.16589954  0.0254965   0.1574857   0.17688066  0.02911299 -0.03092718
   0.19445257 -0.05709129 -0.08631689 -0.04391516  0.13032274  0.10905275
  -0.08515751  0.01056632 -0.17220995 -0.17925954  0.19556305  0.0802278
  -0.03247919 -0.49176937 -0.07767699 -0.03160921 -0.13952136  0.05959712
   0.06858718  0.22386682 -0.16653948  0.19412343 -0.05491862  0.10997339
  -0.15811177 -0.02576607 -0.07910853 -0.258499   -0.04206644 -0.20052543
   0.1705603  -0.15314153  0.0039225  -0.28694248  0.02468278  0.11069503
   0.03733957  0.01433943 -0.11048374  0.11931834 -0.11552787 -0.11110869
   0.02384969 -0.07074881]

但是，有没有办法解决这个问题呢？

原文

I am trying to use a pre-trained model from TensorFlow hub instead of frequency vectorization techniques for word embedding before passing the resultant feature vector to the LDA model.

I followed the steps for the TensorFlow model, but I got this error upon passing the resultant feature vector to the LDA model:

Negative values in data passed to LatentDirichletAllocation.fit

Here's what I have implemented so far:

import pandas as pd
import matplotlib.pyplot as plt
import tensorflow_hub as hub

from sklearn.decomposition import LatentDirichletAllocation

embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim50-with-normalization/1")
embeddings = embed(["cat is on the mat", "dog is in the fog"])
lda_model = LatentDirichletAllocation(n_components=2, max_iter=50)
lda = lda_model.fit_transform(embeddings)

I realized that print(embeddings) prints some negative values as shown below:

tf.Tensor(
[[ 0.16589954  0.0254965   0.1574857   0.17688066  0.02911299 -0.03092718
   0.19445257 -0.05709129 -0.08631689 -0.04391516  0.13032274  0.10905275
  -0.08515751  0.01056632 -0.17220995 -0.17925954  0.19556305  0.0802278
  -0.03247919 -0.49176937 -0.07767699 -0.03160921 -0.13952136  0.05959712
   0.06858718  0.22386682 -0.16653948  0.19412343 -0.05491862  0.10997339
  -0.15811177 -0.02576607 -0.07910853 -0.258499   -0.04206644 -0.20052543
   0.1705603  -0.15314153  0.0039225  -0.28694248  0.02468278  0.11069503
   0.03733957  0.01433943 -0.11048374  0.11931834 -0.11552787 -0.11110869
   0.02384969 -0.07074881]

But, is there a solution to this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

只是在用心讲痛 2025-01-16 12:49:07

由于 LatentDirichletAllocation 的 fit 函数不允许使用负数组，因此我建议您应用 softplus 在嵌入。

这是代码片段：

import pandas as pd
import matplotlib.pyplot as plt
import tensorflow_hub as hub
from tensorflow.math import softplus

from sklearn.decomposition import LatentDirichletAllocation

embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim50-with-normalization/1")
embeddings = softplus(embed(["cat is on the mat", "dog is in the fog"]))

lda_model = LatentDirichletAllocation(n_components=2, max_iter=50)
lda = lda_model.fit_transform(embeddings)

As the fit function of LatentDirichletAllocation does not allow a negative array, I will recommend you to apply softplus on the embeddings.

Here is the code snippet:

import pandas as pd
import matplotlib.pyplot as plt
import tensorflow_hub as hub
from tensorflow.math import softplus

from sklearn.decomposition import LatentDirichletAllocation

embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim50-with-normalization/1")
embeddings = softplus(embed(["cat is on the mat", "dog is in the fog"]))

lda_model = LatentDirichletAllocation(n_components=2, max_iter=50)
lda = lda_model.fit_transform(embeddings)

回复收藏 0 原文

~没有更多了~