单词向量列表上的尺寸降低

发布于 2025-01-21 04:52:51 字数 143 浏览 2 评论 0原文

我有一组表示单词的向量,每个向量具有300个特征,这意味着每个向量有300个浮点。我的目标是将维度降低到50,以便我可以获得一些空间。

如何使用EG TensorFlow在此矢量集上应用维度降低?我找不到一种方法,实现等方法,该方法将向量列表作为输入并减少。

I have a set of vectors that represent words and each vector has 300 features meaning that there are 300 floats for each vector. My goal is to reduce to dimensionality i.e. to 50 so that I can gain some space.

How can apply a dimensionality reduction on this vector set using e.g. tensorflow? I couldn't find a method, an implementation etc. that takes a list of vectors as input and reduces it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

溺深海 2025-01-28 04:52:51

您可能需要研究用于文本处理的卷积神经网络。一般而言,CNN以降低输入向量而闻名。它们通常用于图像分类,但也用于文本和句子分类。您正在寻找的是输入向量的嵌入。引用:

现在,我们的单词已被数字替换,我们可以简单地进行单次编码,但这将导致非常广泛的输入 - 标题数据集中有成千上万个独特的单词。一种更好的方法是降低输入的维度 - 这是通过嵌入层完成的(请参阅此处的完整代码):

这是从这里开始的:

todataScience

另一个ressource:

AnalyticsVidhya

You might want to look into convolutional neural networks for text processing. CNNs in general are known for dimensionality reduction of the input vectors. They are usually used for image classification but also work on text and sentence classification. What you are looking for is the embedding of an input vector. Quote:

Now that our words have been replaced by numbers, we could simply do one-hot encoding but that would result in an extremely wide input — there are thousands of unique words in the titles dataset. A better approach is to reduce the dimensionality of the input — this is done through an embedding layer (see full code here):

This is from here:

TowardsDataScience

Another ressource:

AnalyticsVidhya

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文