搜索熊猫列之间的范围内的值(不是日期列,没有SQL)

发布于 2025-01-28 14:17:50 字数 596 浏览 1 评论 0原文

事先感谢您的帮助。我有两个数据框 如下所示。我需要根据尺寸框架中的信息在销售框架中创建列类别。它应该检查该产品和返回组的最小值内的产品和最大尺寸。可以在熊猫中做吗?不是SQL。我认为合并和加入方法在这里无法使用。

size=pd.DataFrame({"Min Size":[30,41,40],
                   "Max Size":[40, 60, 50],
                   "Category":['small', 'big', "medium"],
                   "Product":['Apple', 'Apple', "Peach"]})
sold=pd.DataFrame({"Purchase_date":["20/01/2020", "18/02/2020", "01/06/2020"],
                          "Size":[35, 45, 42],
                          "Category":["small","big","medium"],
                          "Product":['Apple', 'Peach', "Apple"]})

thanks in advance for help. I have two dataframes
as given below. I need to create column category in sold frame based on information in size frame. It should check siz of product within Min and Max sizes for this product and return group. Is it possible to do it in pandas? not SQL. I think merge and join method will not work here.

size=pd.DataFrame({"Min Size":[30,41,40],
                   "Max Size":[40, 60, 50],
                   "Category":['small', 'big', "medium"],
                   "Product":['Apple', 'Apple', "Peach"]})
sold=pd.DataFrame({"Purchase_date":["20/01/2020", "18/02/2020", "01/06/2020"],
                          "Size":[35, 45, 42],
                          "Category":["small","big","medium"],
                          "Product":['Apple', 'Peach', "Apple"]})

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

空心空情空意 2025-02-04 14:17:50

熊猫中的连接条件必须完全匹配。它没有SQL中的...和... 子句之间的

您可以使用numpy广播将中的每一行比较与size中的每个行进行比较,并进行匹配的过滤器:

# Converting everything to numpy for comparison
sold_product = sold["Product"].to_numpy()[:, None]
sold_size = sold["Size"].to_numpy()[:, None]

product, min_size, max_size = size[["Product", "Min Size", "Max Size"]].T.to_numpy()

# Compare every row in `sold` to every row in `size`.
# `mask` is a len(sold) * len(size) matrix whose value
# indicate if row i in `sold` matches row j in `size`
mask = (sold_product == product) & (min_size <= sold_size) & (sold_size <= max_size)

# For each row in `sold`, find the first row in `size` that
# is True / non-zero
idx, join_key = mask.nonzero()
sold.loc[idx, "join_key"] = join_key

# Result
sold.merge(
    size[["Category"]],
    how="left",
    left_on="join_key",
    right_index=True,
    suffixes=("_Expected", "_Actual"),
)

Joining condition in pandas must be exact match. It doesn't have the BETWEEN ... AND ... clause like in SQL.

You can use numpy broadcast to compare every row in sold to every row in size and filter for a match:

# Converting everything to numpy for comparison
sold_product = sold["Product"].to_numpy()[:, None]
sold_size = sold["Size"].to_numpy()[:, None]

product, min_size, max_size = size[["Product", "Min Size", "Max Size"]].T.to_numpy()

# Compare every row in `sold` to every row in `size`.
# `mask` is a len(sold) * len(size) matrix whose value
# indicate if row i in `sold` matches row j in `size`
mask = (sold_product == product) & (min_size <= sold_size) & (sold_size <= max_size)

# For each row in `sold`, find the first row in `size` that
# is True / non-zero
idx, join_key = mask.nonzero()
sold.loc[idx, "join_key"] = join_key

# Result
sold.merge(
    size[["Category"]],
    how="left",
    left_on="join_key",
    right_index=True,
    suffixes=("_Expected", "_Actual"),
)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文