tfidfvectorizer.transform()实际产生了什么?

发布于 2025-01-30 10:18:53 字数 910 浏览 3 评论 0原文

我是使用tf-idf vectorizer的新手。在运行代码时,我想出了此输出,但无法解释其实际含义。

代码

X=["Access modes govern the type of operations possible in the opened file. It refers to how the file will be used once its opened. These modes also define the location of the File Handle in the file.","File handle is like a cursor, which defines from where the data has to be read or written in the file. There are 6 access modes in python."]

X = np.array(X)

ans = tfidfvectorizer.transform(X)

print(ans)

**OUTPUT**

  (0, 247682)   0.34757472043242427

  (0, 235525)   0.11981132543319443

  (0, 232967)   0.27278177118815816

  (0, 165607)   0.6769351735727495

  (1, 247953)   0.2657562514567408

  (1, 232967)   0.2589999033874122

  (1, 230813)   0.28434013277955594

  (1, 202607)   0.22380408029504645

任何人都可以告诉什么(0,247682)(1,247953)是含义吗?

I am new to using tf-idf vectorizer. While running the code I came up with this output but was not able to interpret what it actually means.

Code

X=["Access modes govern the type of operations possible in the opened file. It refers to how the file will be used once its opened. These modes also define the location of the File Handle in the file.","File handle is like a cursor, which defines from where the data has to be read or written in the file. There are 6 access modes in python."]

X = np.array(X)

ans = tfidfvectorizer.transform(X)

print(ans)

**OUTPUT**

  (0, 247682)   0.34757472043242427

  (0, 235525)   0.11981132543319443

  (0, 232967)   0.27278177118815816

  (0, 165607)   0.6769351735727495

  (1, 247953)   0.2657562514567408

  (1, 232967)   0.2589999033874122

  (1, 230813)   0.28434013277955594

  (1, 202607)   0.22380408029504645

Can anyone tell what (0,247682) and (1,247953) mean?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦醒灬来后我 2025-02-06 10:18:53

首先,您的数据集中有两个句子。在这些句子中发现的每个单词都将分配一个单词ID。

中(0,247682)

0是文档ID或第一个句子,247682是单词ID,0.34757472043242427代码>是其TF-IDF分数

Firstly there are two sentences in your data set. Each word found in these sentences will be assigned a word id.

In (0,247682):

0 is the document id or first sentence, 247682 is the word id, and 0.34757472043242427 is its TF-IDF score

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文