将两个数据框列与二进制数据进行比较

发布于 2025-02-03 16:26:16 字数 344 浏览 4 评论 0原文

我有两个带有二进制数据(1s和0s)的列,我想检查一个列与另一列之间的相似百分比是多少。显然,由于它们是二进制的,因此重要的是巧合基于每个单元的位置,而不是全球量为0和1s。例如:

column_1     column_2
   0            1
   1            1
   0            0
   1            0

在这种情况下,在这两个列中,0s和1s的数量相同(这意味着100%的巧合),但是,考虑到每条的顺序或位置,只有50%的巧合。最后的脂肪是我想找出的那个。

我知道我可以通过循环做到这一点……但是,如果列表较大,这可能是一个问题。

I have two columns with binary data (1s and 0s) And I want to check what's the percent similiarity between one column and the other. Obviously, as they are binary, it is important that the coincidence is based in the position of each cell, not in the global amount of 0s and 1s. In example:

column_1     column_2
   0            1
   1            1
   0            0
   1            0

In that case, in both columns there are the same equal number of 0s and 1s (which means a 100% coincidence) however, taking into account the order or position of each, there's just a 50% coincidence. That last steatment is the one I'm trying to figure out.

I know I could do it with a loop... however in case of larger lists that could be a problem.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

禾厶谷欠 2025-02-10 16:26:17

这将获得一个二进制向量,在col 1等于2和0的情况下,将其总结并除以样品数量。

sim = sum( df.column_1 == df.column_2 ) / len(df.column_1)

This gets a binary vector that gives True where col 1 equals 2 and 0 else where, sums it up, and divides by the number of samples.

sim = sum( df.column_1 == df.column_2 ) / len(df.column_1)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文