tfidfvectorizer.transform()实际产生了什么?
我是使用tf-idf
vectorizer的新手。在运行代码时,我想出了此输出,但无法解释其实际含义。
代码
X=["Access modes govern the type of operations possible in the opened file. It refers to how the file will be used once its opened. These modes also define the location of the File Handle in the file.","File handle is like a cursor, which defines from where the data has to be read or written in the file. There are 6 access modes in python."]
X = np.array(X)
ans = tfidfvectorizer.transform(X)
print(ans)
**OUTPUT**
(0, 247682) 0.34757472043242427
(0, 235525) 0.11981132543319443
(0, 232967) 0.27278177118815816
(0, 165607) 0.6769351735727495
(1, 247953) 0.2657562514567408
(1, 232967) 0.2589999033874122
(1, 230813) 0.28434013277955594
(1, 202607) 0.22380408029504645
任何人都可以告诉什么(0,247682)
和(1,247953)
是含义吗?
I am new to using tf-idf
vectorizer. While running the code I came up with this output but was not able to interpret what it actually means.
Code
X=["Access modes govern the type of operations possible in the opened file. It refers to how the file will be used once its opened. These modes also define the location of the File Handle in the file.","File handle is like a cursor, which defines from where the data has to be read or written in the file. There are 6 access modes in python."]
X = np.array(X)
ans = tfidfvectorizer.transform(X)
print(ans)
**OUTPUT**
(0, 247682) 0.34757472043242427
(0, 235525) 0.11981132543319443
(0, 232967) 0.27278177118815816
(0, 165607) 0.6769351735727495
(1, 247953) 0.2657562514567408
(1, 232967) 0.2589999033874122
(1, 230813) 0.28434013277955594
(1, 202607) 0.22380408029504645
Can anyone tell what (0,247682)
and (1,247953)
mean?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,您的数据集中有两个句子。在这些句子中发现的每个单词都将分配一个单词ID。
在
中(0,247682)
:0
是文档ID或第一个句子,247682
是单词ID,0.34757472043242427
代码>是其TF-IDF分数Firstly there are two sentences in your data set. Each word found in these sentences will be assigned a word id.
In
(0,247682)
:0
is the document id or first sentence,247682
is the word id, and0.34757472043242427
is its TF-IDF score