熊猫矢量查找而没有弃用查找（）

发布于 2025-02-10 06:30:45 字数 1861 浏览 1 评论 0原文

我的问题涉及fookup（），这将被弃用。所以我正在寻找另一种选择。文档建议使用loc（）（似乎不适用于矢量化方法）或melt（）（似乎很复杂）。此外，该文档建议fireverize（）（我认为）对我的设置不起作用。

这是问题：我有一个带有x，y值的2列数据框。

k = 20
y = random.choices(range(1,4),k=k)
x = random.choices(range(1,7),k=k)
tuples = list(zip(x,y))
df = pd.DataFrame(tuples, columns=["x", "y"])
df

而且我在df的Crosstab格式中有几个数据范围。例如，一个称为cij：

Concordance table (Cij):
x     1     2     3    4     5     6  RTotal
y                                           
1   16     15    13  NaN     5   NaN     108
2   NaN    12   NaN   15   NaN   NaN      87
3   NaN   NaN     6  NaN    13    14     121

我现在想在df中从cij中执行矢量化查找，以生成新的列CRC在df中。到目前为止，看起来像这样（简单而简单）：

df["Crc"] = Cij.lookup(df["y"],df["x"])

如果没有lookup（），我该如何实现同一件事？还是我只是不明白建议的替代方案？

提前致谢！

附录：根据要求的工作代码示例。

data = [[1,1],[1,1],[1,2],[1,2],[1,2],[1,3],[1,3],[1,5],[2,2],[2,4],[2,4],[2,4],[2,4],[2,4],[3,3],[3,3],[3,5],[3,5],[3,5],[3,6],[3,6],[3,6],[3,6],[3,6]]
df = pd.DataFrame(data, columns=["y", "x"])

# crosstab of df
ct_a = pd.crosstab(df["y"], df["x"])
Cij = pd.DataFrame([], index=ct_a.index, columns=ct_a.columns) #one of several dfs in ct_a layout

#row-wise, than column-wise filling of Cij
for i in range(ct_a.shape[0]):           
  for j in range(ct_a.shape[1]):
    if ct_a.iloc[i,j] != 0:
      Cij.iloc[i,j]= ct_a.iloc[i+1:,j+1:].sum().sum()+ct_a.iloc[:i,:j].sum().sum()

#vectorized lookup, to be substituted with future-proof method
df["Crc"] = Cij.lookup(df["y"],df["x"])

注意：在这种情况下，cij的基于循环的“填充”很好，因为df的crosstab总是很小。但是，df本身可能很大，因此矢量化查找是必需的。

原文

My problem concerns lookup(), which is to be deprecated. So I'm looking for an alternative. Documentation suggests using loc() (which does not seem to work with a vectorized approach) or melt() (which seems quite convoluted). Furthermore, the documentation suggests factorize() which (I think) does not work for my setup.

Here is the problem:
I have a 2-column DataFrame with x,y-values.

k = 20
y = random.choices(range(1,4),k=k)
x = random.choices(range(1,7),k=k)
tuples = list(zip(x,y))
df = pd.DataFrame(tuples, columns=["x", "y"])
df

And I have several DataFrames in crosstab-format of df. For example one called Cij:

Concordance table (Cij):
x     1     2     3    4     5     6  RTotal
y                                           
1   16     15    13  NaN     5   NaN     108
2   NaN    12   NaN   15   NaN   NaN      87
3   NaN   NaN     6  NaN    13    14     121

I now want to perform a vectorized lookup in Cij from xy-pairs in df to generate a new column CrC in df. Which so far looked like this (plain and simple):

df["Crc"] = Cij.lookup(df["y"],df["x"])

How can I achieve the same thing without lookup()? Or did I just not understand the suggested alternatives?

Thanks in advance!

Addendum: Working code example as requested.

data = [[1,1],[1,1],[1,2],[1,2],[1,2],[1,3],[1,3],[1,5],[2,2],[2,4],[2,4],[2,4],[2,4],[2,4],[3,3],[3,3],[3,5],[3,5],[3,5],[3,6],[3,6],[3,6],[3,6],[3,6]]
df = pd.DataFrame(data, columns=["y", "x"])

# crosstab of df
ct_a = pd.crosstab(df["y"], df["x"])
Cij = pd.DataFrame([], index=ct_a.index, columns=ct_a.columns) #one of several dfs in ct_a layout

#row-wise, than column-wise filling of Cij
for i in range(ct_a.shape[0]):           
  for j in range(ct_a.shape[1]):
    if ct_a.iloc[i,j] != 0:
      Cij.iloc[i,j]= ct_a.iloc[i+1:,j+1:].sum().sum()+ct_a.iloc[:i,:j].sum().sum()

#vectorized lookup, to be substituted with future-proof method
df["Crc"] = Cij.lookup(df["y"],df["x"])

Note: In this case loop-based "filling" of Cij is fine, since crosstabs of df are always small. However, df itself can be very large so vectorized lookup is a necessity.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夕嗳→ 2025-02-17 06:30:45

iiuc，您可以stack cij，然后reidindex基于使用zip ：

df['Crc'] = Cij.stack().reindex(zip(df['y'], df['x'])).to_numpy()
print(df)

输出：输出：

    y  x   Crc
0   1  1  16.0
1   1  1  16.0
2   1  2  15.0
3   1  2  15.0
4   1  2  15.0
5   1  3  13.0
6   1  3  13.0
7   1  5   5.0
8   2  2    12
9   2  4    15
10  2  4    15
11  2  4    15
12  2  4    15
13  2  4    15
14  3  3   6.0
15  3  3   6.0
16  3  5  13.0
17  3  5  13.0
18  3  5  13.0
19  3  6  14.0
20  3  6  14.0
21  3  6  14.0
22  3  6  14.0
23  3  6  14.0

IIUC, you can stack Cij and then reindex based on a list of tuples created by using zip:

df['Crc'] = Cij.stack().reindex(zip(df['y'], df['x'])).to_numpy()
print(df)

Output:

    y  x   Crc
0   1  1  16.0
1   1  1  16.0
2   1  2  15.0
3   1  2  15.0
4   1  2  15.0
5   1  3  13.0
6   1  3  13.0
7   1  5   5.0
8   2  2    12
9   2  4    15
10  2  4    15
11  2  4    15
12  2  4    15
13  2  4    15
14  3  3   6.0
15  3  3   6.0
16  3  5  13.0
17  3  5  13.0
18  3  5  13.0
19  3  6  14.0
20  3  6  14.0
21  3  6  14.0
22  3  6  14.0
23  3  6  14.0

回复收藏 0 原文

叹倦 2025-02-17 06:30:45

使用文档中的路径，您可以复制查找功能：

x_index, x_uniques = pd.factorize(df.x)

arrays = (Cij
          .reindex(columns = x_uniques)
          .to_numpy()[df.y.factorize()[0], x_index]
         )

df['r'] = arrays

df
    y  x     r   Crc
0   1  1  16.0  16.0
1   1  1  16.0  16.0
2   1  2  15.0  15.0
3   1  2  15.0  15.0
4   1  2  15.0  15.0
5   1  3  13.0  13.0
6   1  3  13.0  13.0
7   1  5   5.0   5.0
8   2  2    12  12.0
9   2  4    15  15.0
10  2  4    15  15.0
11  2  4    15  15.0
12  2  4    15  15.0
13  2  4    15  15.0
14  3  3   6.0   6.0
15  3  3   6.0   6.0
16  3  5  13.0  13.0
17  3  5  13.0  13.0
18  3  5  13.0  13.0
19  3  6  14.0  14.0
20  3  6  14.0  14.0
21  3  6  14.0  14.0
22  3  6  14.0  14.0
23  3  6  14.0  14.0

Using the factorize path in the docs, you can replicate the lookup functionality:

x_index, x_uniques = pd.factorize(df.x)

arrays = (Cij
          .reindex(columns = x_uniques)
          .to_numpy()[df.y.factorize()[0], x_index]
         )

df['r'] = arrays

df
    y  x     r   Crc
0   1  1  16.0  16.0
1   1  1  16.0  16.0
2   1  2  15.0  15.0
3   1  2  15.0  15.0
4   1  2  15.0  15.0
5   1  3  13.0  13.0
6   1  3  13.0  13.0
7   1  5   5.0   5.0
8   2  2    12  12.0
9   2  4    15  15.0
10  2  4    15  15.0
11  2  4    15  15.0
12  2  4    15  15.0
13  2  4    15  15.0
14  3  3   6.0   6.0
15  3  3   6.0   6.0
16  3  5  13.0  13.0
17  3  5  13.0  13.0
18  3  5  13.0  13.0
19  3  6  14.0  14.0
20  3  6  14.0  14.0
21  3  6  14.0  14.0
22  3  6  14.0  14.0
23  3  6  14.0  14.0

回复收藏 0 原文

溺ぐ爱和你が 2025-02-17 06:30:45

如果您已经检查了df [“ crc”] = cij.loc [df [y“”]，df [x“”]]，您会注意到它返回一个数组。通过将其与df [“ crc”] = cij..lookup（df [“ y”]，df [“ x”]）进行比较，您还会注意到领先的对角线是相同的（哪个有意义）。因此，您可以添加np.diagonal以返回所需的内容：

df["Crc"] = np.diagonal(Cij.loc[df["y"], df["x"]])

If you have checked df["Crc"] = Cij.loc[df["y"], df["x"]], you will notice that it returns an array. By comparing this with df["Crc"] = Cij.lookup(df["y"],df["x"]), you will also notice that the leading diagonal is the same (which makes sense). Therefore, you can add np.diagonal to return what you need:

df["Crc"] = np.diagonal(Cij.loc[df["y"], df["x"]])

回复收藏 0 原文

~没有更多了~

关于作者

一片旧的回忆

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

熊猫矢量查找而没有弃用查找（）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

熊猫矢量查找而没有弃用查找（）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。