scipy.sparse.csr.csr_matrix 的 max 和 argmax 的含义

发布于 2025-01-19 20:11:34 字数 2638 浏览 2 评论 0原文

我有这个tf-idf矩阵

type(dt)  # output: scipy.sparse.csr.csr_matrix
pd.DataFrame(dt.toarray())

# output:

        0          1            2           3        4          5
0   0.000000    0.000000    0.500000    0.500000    0.5    0.50000
1   0.707107    0.707107    0.000000    0.000000    0.0    0.00000
2   0.000000    0.000000    0.000000    0.000000    0.0    0.00000
3   0.000000    0.000000    0.707107    0.707107    0.0    0.00000
4   0.000000    0.000000    0.000000    0.000000    0.0    0.00000
5   0.000000    0.000000    0.000000    0.000000    0.0    0.00000
6   0.577350    0.577350    0.000000    0.000000    0.0    0.57735
7   0.000000    0.000000    0.000000    0.000000    0.0    0.00000
8   0.000000    0.000000    0.000000    0.000000    0.0    0.00000
9   0.000000    0.000000    0.000000    0.000000    1.0    0.00000

我运行了此代码，以了解max和argmax的含义，

test = np.dot(dt, np.transpose(dt))
test[test > 0.9999] = np.nan
ind = np.unravel_index(np.argmax(test), test.shape)
print('shape of test', test.shape)
print(f'max of test: {test.max()}')
print(f'argmax of test: {np.argmax(test)}')
print('location of max value:', ind)
print('value at the location:', test[ind])
print(pd.DataFrame(test.toarray()))

该矩阵产生了此输出

shape of test (10, 10)
max of test: nan
argmax of test: 1
location of max value: (0, 1)
value at the location: 0.0
          0         1    2         3    4    5         6    7    8    9
0       NaN  0.000000  0.0  0.707107  0.0  0.0  0.288675  0.0  0.0  0.5
1  0.000000       NaN  0.0  0.000000  0.0  0.0  0.816497  0.0  0.0  0.0
2  0.000000  0.000000  0.0  0.000000  0.0  0.0  0.000000  0.0  0.0  0.0
3  0.707107  0.000000  0.0       NaN  0.0  0.0  0.000000  0.0  0.0  0.0
4  0.000000  0.000000  0.0  0.000000  0.0  0.0  0.000000  0.0  0.0  0.0
5  0.000000  0.000000  0.0  0.000000  0.0  0.0  0.000000  0.0  0.0  0.0
6  0.288675  0.816497  0.0  0.000000  0.0  0.0       NaN  0.0  0.0  0.0
7  0.000000  0.000000  0.0  0.000000  0.0  0.0  0.000000  0.0  0.0  0.0
8  0.000000  0.000000  0.0  0.000000  0.0  0.0  0.000000  0.0  0.0  0.0
9  0.500000  0.000000  0.0  0.000000  0.0  0.0  0.000000  0.0  0.0  NaN

，但我可以't了解测试最大的输出的含义：NAN，测试的Argmax：1和最大值的位置：（0，1）。我认为测试的最大和gragmax应该是 0.816497 而不是NAN和 1 分别；最大值的位置应为（6，1）或（1，6），其中显示了0.816497的位置。

有人可以解释测试最大的代码，test和最大值的位置做了什么？

原文

I have this tf-idf matrix

type(dt)  # output: scipy.sparse.csr.csr_matrix
pd.DataFrame(dt.toarray())

# output:

        0          1            2           3        4          5
0   0.000000    0.000000    0.500000    0.500000    0.5    0.50000
1   0.707107    0.707107    0.000000    0.000000    0.0    0.00000
2   0.000000    0.000000    0.000000    0.000000    0.0    0.00000
3   0.000000    0.000000    0.707107    0.707107    0.0    0.00000
4   0.000000    0.000000    0.000000    0.000000    0.0    0.00000
5   0.000000    0.000000    0.000000    0.000000    0.0    0.00000
6   0.577350    0.577350    0.000000    0.000000    0.0    0.57735
7   0.000000    0.000000    0.000000    0.000000    0.0    0.00000
8   0.000000    0.000000    0.000000    0.000000    0.0    0.00000
9   0.000000    0.000000    0.000000    0.000000    1.0    0.00000

I ran this code to understand the meaning of max and argmax of the matrix

test = np.dot(dt, np.transpose(dt))
test[test > 0.9999] = np.nan
ind = np.unravel_index(np.argmax(test), test.shape)
print('shape of test', test.shape)
print(f'max of test: {test.max()}')
print(f'argmax of test: {np.argmax(test)}')
print('location of max value:', ind)
print('value at the location:', test[ind])
print(pd.DataFrame(test.toarray()))

Which produced this output

shape of test (10, 10)
max of test: nan
argmax of test: 1
location of max value: (0, 1)
value at the location: 0.0
          0         1    2         3    4    5         6    7    8    9
0       NaN  0.000000  0.0  0.707107  0.0  0.0  0.288675  0.0  0.0  0.5
1  0.000000       NaN  0.0  0.000000  0.0  0.0  0.816497  0.0  0.0  0.0
2  0.000000  0.000000  0.0  0.000000  0.0  0.0  0.000000  0.0  0.0  0.0
3  0.707107  0.000000  0.0       NaN  0.0  0.0  0.000000  0.0  0.0  0.0
4  0.000000  0.000000  0.0  0.000000  0.0  0.0  0.000000  0.0  0.0  0.0
5  0.000000  0.000000  0.0  0.000000  0.0  0.0  0.000000  0.0  0.0  0.0
6  0.288675  0.816497  0.0  0.000000  0.0  0.0       NaN  0.0  0.0  0.0
7  0.000000  0.000000  0.0  0.000000  0.0  0.0  0.000000  0.0  0.0  0.0
8  0.000000  0.000000  0.0  0.000000  0.0  0.0  0.000000  0.0  0.0  0.0
9  0.500000  0.000000  0.0  0.000000  0.0  0.0  0.000000  0.0  0.0  NaN

But I couldn't understand the meaning of the output for max of test: nan, argmax of test: 1 and location of max value: (0, 1). I thought the max of test and argmax should be 0.816497 instead of nan and 1 respectively; and the location of the max value should be (6, 1) or (1, 6) where the value 0.816497 was displayed.

Could someone please explain what the code for max of test, argmax of test and location of max value did?

分享到QQ

分享到微博