循环通过熊猫矩阵（以数据框架的形式），并根据条件更改元素

发布于 2025-01-27 16:55:11 字数 1069 浏览 1 评论 0原文

我有一个以下矩阵的形式的熊猫数据框，该图表示行中的元素（人）之间的相似性得分。

|            |     A     |     B     |    C     |
|------------|-----------|---- ------|----------|
|      D     |    0.4    |    0.1    |   0.1    |
|      E     |    0.2    |    0.1    |   0.4    |
|      F     |    0.9    |    0.4    |   0.3    |
|      G     |    0.4    |    0.2    |   0.6    |
|      H     |    0.3    |    0.1    |   0.7    |

此外，我还有这些元素的位置标识符列表。

A - London
B - Sydney
C - Paris
D - Paris
E - Delhi
F - London
G - Melbourne
H - Mumbai

如果两个元素之间的位置相同，我想循环遍历矩阵，并使相似度得分等于0。在此示例中，我想替换为0.9的A和F的交点，D和C的交点为0.1，每个a和f的相交为0.1。

谢谢！

编辑：

我正在寻找的最终预期输出如下：

|            |     A     |     B     |    C     |
|------------|-----------|---- ------|----------|
|      D     |    0.4    |    0.1    |   0.0    |
|      E     |    0.2    |    0.1    |   0.4    |
|      F     |    0.0    |    0.4    |   0.3    |
|      G     |    0.4    |    0.2    |   0.6    |
|      H     |    0.3    |    0.1    |   0.7    |

原文

I have a Pandas Dataframe in the form of a matrix below which represents similarity scores between the elements (people) in the rows and the columns.

|            |     A     |     B     |    C     |
|------------|-----------|---- ------|----------|
|      D     |    0.4    |    0.1    |   0.1    |
|      E     |    0.2    |    0.1    |   0.4    |
|      F     |    0.9    |    0.4    |   0.3    |
|      G     |    0.4    |    0.2    |   0.6    |
|      H     |    0.3    |    0.1    |   0.7    |

Further, I have a list of location identifiers for these elements.

A - London
B - Sydney
C - Paris
D - Paris
E - Delhi
F - London
G - Melbourne
H - Mumbai

I want to loop through the matrix and make the similarity score equal to 0 if the location is same between the two elements. In this example, I want to replace the intersection of A and F which is 0.9 and the intersection of D and C which is 0.1 with 0 each.

Thanks!

Edit:

The final expected output I am looking for is as below:

|            |     A     |     B     |    C     |
|------------|-----------|---- ------|----------|
|      D     |    0.4    |    0.1    |   0.0    |
|      E     |    0.2    |    0.1    |   0.4    |
|      F     |    0.0    |    0.4    |   0.3    |
|      G     |    0.4    |    0.2    |   0.6    |
|      H     |    0.3    |    0.1    |   0.7    |

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

客…行舟 2025-02-03 16:55:11

对于匹配列名称，创建了字典。然后是重命名索引和列，并与Numpy Broadcasting进行比较，最后将蒙版传递到 dataframe.mask ：

d = {'A': 'London', 'B': 'Sydney', 'C': 'Paris', 'D': 'Paris', 
     'E': 'Delhi', 'F': 'London', 'G':'Melbourne','H':'Mumbai'}


df1 = df.rename(index=d, columns=d)
df = df.mask(df1.index.to_numpy()[:, None] == df1.columns.to_numpy(), 0)
print (df)
     A    B    C
D  0.4  0.1  0.0
E  0.2  0.1  0.4
F  0.0  0.4  0.3
G  0.4  0.2  0.6
H  0.3  0.1  0.7

详细信息：

print (df1)
           London  Sydney  Paris
Paris         0.4     0.1    0.1
Delhi         0.2     0.1    0.4
London        0.9     0.4    0.3
Melbourne     0.4     0.2    0.6
Mumbai        0.3     0.1    0.7

print (df1.index.to_numpy()[:, None] == df1.columns.to_numpy())
[[False False  True]
 [False False False]
 [ True False False]
 [False False False]
 [False False False]]

For match columns names with cities was created dictionary. Then is rename index and columns and compare with numpy broadcasting, last pass mask to DataFrame.mask:

d = {'A': 'London', 'B': 'Sydney', 'C': 'Paris', 'D': 'Paris', 
     'E': 'Delhi', 'F': 'London', 'G':'Melbourne','H':'Mumbai'}


df1 = df.rename(index=d, columns=d)
df = df.mask(df1.index.to_numpy()[:, None] == df1.columns.to_numpy(), 0)
print (df)
     A    B    C
D  0.4  0.1  0.0
E  0.2  0.1  0.4
F  0.0  0.4  0.3
G  0.4  0.2  0.6
H  0.3  0.1  0.7

Details:

print (df1)
           London  Sydney  Paris
Paris         0.4     0.1    0.1
Delhi         0.2     0.1    0.4
London        0.9     0.4    0.3
Melbourne     0.4     0.2    0.6
Mumbai        0.3     0.1    0.7

print (df1.index.to_numpy()[:, None] == df1.columns.to_numpy())
[[False False  True]
 [False False False]
 [ True False False]
 [False False False]
 [False False False]]

回复收藏 0 原文

~没有更多了~