当前位置：文江博客话题详情

pandas dataframe euclidean-distance

在熊猫数据框架中查找行之间的距离，但参考1行

发布于 2025-02-10 21:44:17 字数 585 浏览 0 评论 0 原文

在此PANDAS DATAFRAME中：

y_train     feat1   feat2
0   9.596113    -7.900107
1   -1.384157   2.685313
2   -8.211954   5.214797

如何在DataFrame末尾添加“距离0级”列的“距离0”列，该框架返回每个类（即每行）与y_train = 0的距离？我想使用0类用作参考。在此dataframe中， feat 1 = x 和 feat 2 = y

我尝试过：

from sklearn.metrics import pairwise_distances

pairwise_distances(df_centroid['feat1'].values, df_centroid['feat2'].values)

但是这给了我错误，

ValueError: Expected 2D array, got 1D array instead:

任何帮助都将不胜感激！

谢谢！

原文

In this pandas dataframe:

y_train     feat1   feat2
0   9.596113    -7.900107
1   -1.384157   2.685313
2   -8.211954   5.214797

How do I go about adding a "distance from Class 0" column at the end of the dataframe, that returns the distance from y_train=0 for each class (i.e. each row)? I want to use class 0 as the reference. In this dataframe, feat 1 = x and feat 2 = y

I tried:

from sklearn.metrics import pairwise_distances

pairwise_distances(df_centroid['feat1'].values, df_centroid['feat2'].values)

but that gave me an error

ValueError: Expected 2D array, got 1D array instead:

Any help will be greatly appreciated!

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泪痕残 2025-02-17 21:44:17

pairwise_distances想要第一个输入x-所有点 - 然后是y-我们要计算距离的位置。

因此，对于X，我们有：所有课程。每个功能都是其位置的一个坐标或数学术语，类是向量 f = [ f ₀， f ₁]，其中每个 f _i都是特征权重。

对于y，我们有：class 0。我们要计算到第0类的距离，对于每个X。

只需尝试一下，我们就可以看到

import pandas as pd
import numpy as np
import io

df = pd.read_table(io.StringIO("""
y_train     feat1   feat2
0   9.596113    -7.900107
1   -1.384157   2.685313
2   -8.211954   5.214797
"""), sep=r"\s+")

df[['feat1', 'feat2']].to_numpy()

array([[ 9.596113, -7.900107],
       [-1.384157,  2.685313],
       [-8.211954,  5.214797]])

from sklearn.metrics import pairwise_distances

pairwise_distances(df[['feat1', 'feat2']].to_numpy(), [[ 9.596113, -7.900107]])

# Output
array([[ 0.        ],
       [15.2518014 ],
       [22.11623741]])

继续进行正确执行此操作

AHA，所以让我们在两种情况下
这就是我使用 .loc ie 0：0 的2D切片的原因。
（并且.to_numpy（）等效物会自动发生，但请记住
想一想如何处理丢失的数据。）

df['distance'] = pairwise_distances(df[['feat1', 'feat2']],
                                    df.loc[0:0, ['feat1', 'feat2']])
df

   y_train     feat1     feat2   distance
0        0  9.596113 -7.900107   0.000000
1        1 -1.384157  2.685313  15.251801
2        2 -8.211954  5.214797  22.116237

我刚刚采用了您在问题中提到的距离指标。现在，您可以看到如何使用它，可以自由地用另一个指标替换它。以这种方式，Sklearn API是非常灵活的WRT互换算法。

pairwise_distances wants a first input X - all the points - and then Y - where we want to compute the distance to.

So for X we have: All the classes. Each feature is one coordinate of its location or in mathematical terms, the class is a vector f = [f₀, f₁] where each f_i is a feature weight.

For Y we have: Class 0. We want to compute the distance to Class 0, for each X.

Just experimenting a bit we can see that

import pandas as pd
import numpy as np
import io

df = pd.read_table(io.StringIO("""
y_train     feat1   feat2
0   9.596113    -7.900107
1   -1.384157   2.685313
2   -8.211954   5.214797
"""), sep=r"\s+")

df[['feat1', 'feat2']].to_numpy()

array([[ 9.596113, -7.900107],
       [-1.384157,  2.685313],
       [-8.211954,  5.214797]])

from sklearn.metrics import pairwise_distances

pairwise_distances(df[['feat1', 'feat2']].to_numpy(), [[ 9.596113, -7.900107]])

# Output
array([[ 0.        ],
       [15.2518014 ],
       [22.11623741]])

Aha, so let's go ahead and do this properly

We want an ndim=2 array as input to pairwise_distances in both cases,
which is the reason I use a 2D slice for .loc i.e 0:0.
(And the .to_numpy() equivalents happen automatically, but remember
to think about how pairwise_distances would handle missing data.)

df['distance'] = pairwise_distances(df[['feat1', 'feat2']],
                                    df.loc[0:0, ['feat1', 'feat2']])
df

   y_train     feat1     feat2   distance
0        0  9.596113 -7.900107   0.000000
1        1 -1.384157  2.685313  15.251801
2        2 -8.211954  5.214797  22.116237

I've just taken the distance metric that you mentioned in your question. Now that you see how it can be used, you're free to replace it with another metric. The sklearn API is quite flexible w.r.t interchangeable algorithms in this way.

回复收藏 0 原文

~没有更多了~