尝试组织大型数据集，然后使用 Spyder (Python 3.9) 确定平均天数和标准差

发布于 2025-01-11 09:08:50 字数 1368 浏览 0 评论 0原文

*编辑：将代码发布为数据框而不是工作表链接。

我有一个由约 650 万行和 6 列组成的大型数据集。这些行是与唯一商品关联的 BrandId（例如 01-00058），我需要使用的 3 列是：BrandId、InventoryDate 和 OnHand。

          BrandID  SalesPrice InventoryDate   Size  OnHand  PurchasePrice
0        01-00058        9.28    2018-06-30  750mL       6           6.77
1        01-00058        9.28    2018-07-01  750mL       6           6.77
2        01-00058        9.28    2018-07-02  750mL       6           6.77
3        01-00058        9.28    2018-07-03  750mL     102           6.77
4        01-00058        9.28    2018-07-04  750mL      96           6.77
          ...         ...           ...    ...     ...            ...
6531265  02-90631       12.74    2019-06-26  400mL      60           8.49
6531266  02-90631       12.74    2019-06-27  400mL      60           8.49
6531267  02-90631       12.74    2019-06-28  400mL      60           8.49
6531268  02-90631       12.74    2019-06-29  400mL      60           8.49
6531269  02-90631       12.74    2019-06-30  400mL      60           8.49

[6531270 rows x 6 columns]

我想确定每个特定 BrandId 手头上没有库存的天数。例如，BrandId 01-00058 有 27 个唯一天数，其中 OnHand = 0。我想总结所有唯一 BrandId 的信息。

然后，我想找到这些独特 BrandId 与每个缺货日期的平均值和标准偏差。

理想情况下，我希望在变量资源管理器中以表格形式查看此信息，其中包含：

BrandID     Sum OnHand = 0
01-00058    27
01-00061    39
01-00062    14
``'

原文

*Edit: Posted code as dataframe not sheets link.

I have a large data set consisting of ~6.5M rows and 6 columns. The rows are BrandId's (e.g., 01-00058) associated with unique items and the 3 columns I need utilized are: BrandId, InventoryDate, and OnHand.

          BrandID  SalesPrice InventoryDate   Size  OnHand  PurchasePrice
0        01-00058        9.28    2018-06-30  750mL       6           6.77
1        01-00058        9.28    2018-07-01  750mL       6           6.77
2        01-00058        9.28    2018-07-02  750mL       6           6.77
3        01-00058        9.28    2018-07-03  750mL     102           6.77
4        01-00058        9.28    2018-07-04  750mL      96           6.77
          ...         ...           ...    ...     ...            ...
6531265  02-90631       12.74    2019-06-26  400mL      60           8.49
6531266  02-90631       12.74    2019-06-27  400mL      60           8.49
6531267  02-90631       12.74    2019-06-28  400mL      60           8.49
6531268  02-90631       12.74    2019-06-29  400mL      60           8.49
6531269  02-90631       12.74    2019-06-30  400mL      60           8.49

[6531270 rows x 6 columns]

I would like to determine how many days each particular BrandId has no inventory on hand. For example, BrandId 01-00058 has 27 unique days where OnHand = 0. I would like summarize that information for all unique BrandId's.

I would then like to find the mean and standard deviation of these unique BrandId's from the days each is stocked out.

Ideally, I would love this information to be viewed in the variable explorer as a table that reads:

BrandID     Sum OnHand = 0
01-00058    27
01-00061    39
01-00062    14
``'

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

<逆流佳人身旁 2025-01-18 09:08:50

IIUC，尝试使用groupby：

>>> df[df["OnHand"].eq(0)].groupby("BrandID")["InventoryDate"].nunique()

IIUC, try with groupby:

>>> df[df["OnHand"].eq(0)].groupby("BrandID")["InventoryDate"].nunique()

回复收藏 0 原文

~没有更多了~

关于作者

梦在深巷

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

尝试组织大型数据集，然后使用 Spyder (Python 3.9) 确定平均天数和标准差

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

卷耳

佚名

℉服软

qq_2gSKZM

凉宸

gyhjy

友情链接

尝试组织大型数据集，然后使用 Spyder (Python 3.9) 确定平均天数和标准差

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

卷耳

佚名

℉服软

qq_2gSKZM

凉宸

gyhjy

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。