熊猫矢量查找而没有弃用查找()
我的问题涉及fookup()
,这将被弃用。所以我正在寻找另一种选择。文档建议使用loc()
(似乎不适用于矢量化方法)或melt()
(似乎很复杂)。此外,该文档建议fireverize()
(我认为)对我的设置不起作用。
这是问题: 我有一个带有x,y值的2列数据框。
k = 20
y = random.choices(range(1,4),k=k)
x = random.choices(range(1,7),k=k)
tuples = list(zip(x,y))
df = pd.DataFrame(tuples, columns=["x", "y"])
df
而且我在df
的Crosstab格式中有几个数据范围。例如,一个称为cij
:
Concordance table (Cij):
x 1 2 3 4 5 6 RTotal
y
1 16 15 13 NaN 5 NaN 108
2 NaN 12 NaN 15 NaN NaN 87
3 NaN NaN 6 NaN 13 14 121
我现在想在df
中从cij
中执行矢量化查找,以生成新的列CRC
在df
中。到目前为止,看起来像这样(简单而简单):
df["Crc"] = Cij.lookup(df["y"],df["x"])
如果没有lookup()
,我该如何实现同一件事?还是我只是不明白建议的替代方案?
提前致谢!
附录:根据要求的工作代码示例。
data = [[1,1],[1,1],[1,2],[1,2],[1,2],[1,3],[1,3],[1,5],[2,2],[2,4],[2,4],[2,4],[2,4],[2,4],[3,3],[3,3],[3,5],[3,5],[3,5],[3,6],[3,6],[3,6],[3,6],[3,6]]
df = pd.DataFrame(data, columns=["y", "x"])
# crosstab of df
ct_a = pd.crosstab(df["y"], df["x"])
Cij = pd.DataFrame([], index=ct_a.index, columns=ct_a.columns) #one of several dfs in ct_a layout
#row-wise, than column-wise filling of Cij
for i in range(ct_a.shape[0]):
for j in range(ct_a.shape[1]):
if ct_a.iloc[i,j] != 0:
Cij.iloc[i,j]= ct_a.iloc[i+1:,j+1:].sum().sum()+ct_a.iloc[:i,:j].sum().sum()
#vectorized lookup, to be substituted with future-proof method
df["Crc"] = Cij.lookup(df["y"],df["x"])
注意:在这种情况下,cij
的基于循环的“填充”很好,因为df
的crosstab总是很小。但是,df
本身可能很大,因此矢量化查找是必需的。
My problem concerns lookup()
, which is to be deprecated. So I'm looking for an alternative. Documentation suggests using loc()
(which does not seem to work with a vectorized approach) or melt()
(which seems quite convoluted). Furthermore, the documentation suggests factorize()
which (I think) does not work for my setup.
Here is the problem:
I have a 2-column DataFrame with x,y-values.
k = 20
y = random.choices(range(1,4),k=k)
x = random.choices(range(1,7),k=k)
tuples = list(zip(x,y))
df = pd.DataFrame(tuples, columns=["x", "y"])
df
And I have several DataFrames in crosstab-format of df
. For example one called Cij
:
Concordance table (Cij):
x 1 2 3 4 5 6 RTotal
y
1 16 15 13 NaN 5 NaN 108
2 NaN 12 NaN 15 NaN NaN 87
3 NaN NaN 6 NaN 13 14 121
I now want to perform a vectorized lookup in Cij
from xy-pairs in df
to generate a new column CrC
in df
. Which so far looked like this (plain and simple):
df["Crc"] = Cij.lookup(df["y"],df["x"])
How can I achieve the same thing without lookup()
? Or did I just not understand the suggested alternatives?
Thanks in advance!
Addendum: Working code example as requested.
data = [[1,1],[1,1],[1,2],[1,2],[1,2],[1,3],[1,3],[1,5],[2,2],[2,4],[2,4],[2,4],[2,4],[2,4],[3,3],[3,3],[3,5],[3,5],[3,5],[3,6],[3,6],[3,6],[3,6],[3,6]]
df = pd.DataFrame(data, columns=["y", "x"])
# crosstab of df
ct_a = pd.crosstab(df["y"], df["x"])
Cij = pd.DataFrame([], index=ct_a.index, columns=ct_a.columns) #one of several dfs in ct_a layout
#row-wise, than column-wise filling of Cij
for i in range(ct_a.shape[0]):
for j in range(ct_a.shape[1]):
if ct_a.iloc[i,j] != 0:
Cij.iloc[i,j]= ct_a.iloc[i+1:,j+1:].sum().sum()+ct_a.iloc[:i,:j].sum().sum()
#vectorized lookup, to be substituted with future-proof method
df["Crc"] = Cij.lookup(df["y"],df["x"])
Note: In this case loop-based "filling" of Cij
is fine, since crosstabs of df
are always small. However, df
itself can be very large so vectorized lookup is a necessity.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
iiuc,您可以
stack
cij,然后reidindex
基于使用zip
:输出:输出:
IIUC, you can
stack
Cij and thenreindex
based on a list of tuples created by usingzip
:Output:
使用文档中的路径,您可以复制查找功能:
Using the factorize path in the docs, you can replicate the lookup functionality:
如果您已经检查了
df [“ crc”] = cij.loc [df [y“”],df [x“”]]
,您会注意到它返回一个数组。通过将其与df [“ crc”] = cij..lookup(df [“ y”],df [“ x”])
进行比较,您还会注意到领先的对角线是相同的(哪个有意义)。因此,您可以添加np.diagonal
以返回所需的内容:If you have checked
df["Crc"] = Cij.loc[df["y"], df["x"]]
, you will notice that it returns an array. By comparing this withdf["Crc"] = Cij.lookup(df["y"],df["x"])
, you will also notice that the leading diagonal is the same (which makes sense). Therefore, you can addnp.diagonal
to return what you need: