搜索熊猫列之间的范围内的值（不是日期列，没有SQL）

发布于 2025-01-28 14:17:50 字数 596 浏览 1 评论 0原文

事先感谢您的帮助。我有两个数据框如下所示。我需要根据尺寸框架中的信息在销售框架中创建列类别。它应该检查该产品和返回组的最小值内的产品和最大尺寸。可以在熊猫中做吗？不是SQL。我认为合并和加入方法在这里无法使用。

size=pd.DataFrame({"Min Size":[30,41,40],
                   "Max Size":[40, 60, 50],
                   "Category":['small', 'big', "medium"],
                   "Product":['Apple', 'Apple', "Peach"]})
sold=pd.DataFrame({"Purchase_date":["20/01/2020", "18/02/2020", "01/06/2020"],
                          "Size":[35, 45, 42],
                          "Category":["small","big","medium"],
                          "Product":['Apple', 'Peach', "Apple"]})

原文

thanks in advance for help. I have two dataframes
as given below. I need to create column category in sold frame based on information in size frame. It should check siz of product within Min and Max sizes for this product and return group. Is it possible to do it in pandas? not SQL. I think merge and join method will not work here.

size=pd.DataFrame({"Min Size":[30,41,40],
                   "Max Size":[40, 60, 50],
                   "Category":['small', 'big', "medium"],
                   "Product":['Apple', 'Apple', "Peach"]})
sold=pd.DataFrame({"Purchase_date":["20/01/2020", "18/02/2020", "01/06/2020"],
                          "Size":[35, 45, 42],
                          "Category":["small","big","medium"],
                          "Product":['Apple', 'Peach', "Apple"]})

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

空心空情空意 2025-02-04 14:17:50

熊猫中的连接条件必须完全匹配。它没有SQL中的...和... 子句之间的。

您可以使用numpy广播将中的每一行比较与size中的每个行进行比较，并进行匹配的过滤器：

# Converting everything to numpy for comparison
sold_product = sold["Product"].to_numpy()[:, None]
sold_size = sold["Size"].to_numpy()[:, None]

product, min_size, max_size = size[["Product", "Min Size", "Max Size"]].T.to_numpy()

# Compare every row in `sold` to every row in `size`.
# `mask` is a len(sold) * len(size) matrix whose value
# indicate if row i in `sold` matches row j in `size`
mask = (sold_product == product) & (min_size <= sold_size) & (sold_size <= max_size)

# For each row in `sold`, find the first row in `size` that
# is True / non-zero
idx, join_key = mask.nonzero()
sold.loc[idx, "join_key"] = join_key

# Result
sold.merge(
    size[["Category"]],
    how="left",
    left_on="join_key",
    right_index=True,
    suffixes=("_Expected", "_Actual"),
)

Joining condition in pandas must be exact match. It doesn't have the BETWEEN ... AND ... clause like in SQL.

You can use numpy broadcast to compare every row in sold to every row in size and filter for a match:

# Converting everything to numpy for comparison
sold_product = sold["Product"].to_numpy()[:, None]
sold_size = sold["Size"].to_numpy()[:, None]

product, min_size, max_size = size[["Product", "Min Size", "Max Size"]].T.to_numpy()

# Compare every row in `sold` to every row in `size`.
# `mask` is a len(sold) * len(size) matrix whose value
# indicate if row i in `sold` matches row j in `size`
mask = (sold_product == product) & (min_size <= sold_size) & (sold_size <= max_size)

# For each row in `sold`, find the first row in `size` that
# is True / non-zero
idx, join_key = mask.nonzero()
sold.loc[idx, "join_key"] = join_key

# Result
sold.merge(
    size[["Category"]],
    how="left",
    left_on="join_key",
    right_index=True,
    suffixes=("_Expected", "_Actual"),
)

回复收藏 0 原文

~没有更多了~