如何与Groupby一起使用Sklearn Standardardscaler？

发布于 2025-02-02 06:20:41 字数 1354 浏览 5 评论 0原文

我正在尝试根据日期对大熊猫的数据框架进行标准化。

My dataset looks like this:

date	permno	ret	cumret	mom1m	mom3m	mom6m
2004-01-30	80000	0.053	1.497	0.067	0.140	0.137
2004-02-29	80000	0.053	1.497	0.067	0.140	0.137
2004-03-31	80000	0.053	1.497	0.067	0.140	0.137
2004-01- 30	80001	0.053	1.497	0.067	0.140	0.137
2004-02-29	80001	0.053	1.497	0.067	0.140 0.140	0.137
2004-03-31	80001	0.053 1.497 0.067	0.067	0.140	0.140	0.137

我试图缩放代码>，MOM6M基于日期。

因此，第一行应用第四行缩放，第二行应用第5行缩放，第三行应用最后一行缩放。

我尝试的是

crsp2[scale_cols] = crsp2.groupby('date')[scale_cols].apply(lambda x: StandardScaler().fit_transform(x))

crsp2是我要缩放的数据框架，而scale_cols是我要扩展的功能列表。

原文

I'm trying to normalize a pandas dataframe while grouping it based on the dates.

My dataset looks like this:

date	permno	ret	cumret	mom1m	mom3m	mom6m
2004-01-30	80000	0.053	1.497	0.067	0.140	0.137
2004-02-29	80000	0.053	1.497	0.067	0.140	0.137
2004-03-31	80000	0.053	1.497	0.067	0.140	0.137
2004-01-30	80001	0.053	1.497	0.067	0.140	0.137
2004-02-29	80001	0.053	1.497	0.067	0.140	0.137
2004-03-31	80001	0.053	1.497	0.067	0.140	0.137

I'm trying to scale mom1m, mom3m, mom6m based on the dates.

So the first row should be scaled with the 4th row, the second row should be scaled with the 5th row, the third row should be scaled with the last row.

What I've tried is

crsp2[scale_cols] = crsp2.groupby('date')[scale_cols].apply(lambda x: StandardScaler().fit_transform(x))

where crsp2 is the dataframe i'm trying to scale and scale_cols is the list of features I'm trying to scale.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

亢潮 2025-02-09 06:20:41

一个简单的解决方案可以使用 “>标准标准。

您的代码看起来像这样：

from sklearn.preprocessing import scale

# set permno and date as multi-index
crsp2.set_index(keys =["date", "permno"],drop=True)

# columns to scale
scale_cols = ["mom1m","mom3m", "mom6m"]

# apply scaler rankwise
crsp2[scale_cols] = crsp2.groupby('date')[scale_cols].transform(lambda x: scale(x))

输出：

date    permno  ret cumret  mom1m   mom3m   mom6m
0   2004-01-30  80000   0.053   1.497   0.0 0.0 0.0
1   2004-02-29  80000   0.053   1.497   0.0 0.0 0.0
2   2004-03-31  80000   0.053   1.497   0.0 0.0 0.0
3   2004-01-30  80001   0.053   1.497   0.0 0.0 0.0
4   2004-02-29  80001   0.053   1.497   0.0 0.0 0.0
5   2004-03-31  80001   0.053   1.497   0.0 0.0 0.0

A simpler solution could use scale() the pipelined version of the StandardScaler.

Your code would look like this:

from sklearn.preprocessing import scale

# set permno and date as multi-index
crsp2.set_index(keys =["date", "permno"],drop=True)

# columns to scale
scale_cols = ["mom1m","mom3m", "mom6m"]

# apply scaler rankwise
crsp2[scale_cols] = crsp2.groupby('date')[scale_cols].transform(lambda x: scale(x))

Output:

date    permno  ret cumret  mom1m   mom3m   mom6m
0   2004-01-30  80000   0.053   1.497   0.0 0.0 0.0
1   2004-02-29  80000   0.053   1.497   0.0 0.0 0.0
2   2004-03-31  80000   0.053   1.497   0.0 0.0 0.0
3   2004-01-30  80001   0.053   1.497   0.0 0.0 0.0
4   2004-02-29  80001   0.053   1.497   0.0 0.0 0.0
5   2004-03-31  80001   0.053   1.497   0.0 0.0 0.0

回复收藏 0 原文

初熏 2025-02-09 06:20:41

感谢此答案，您可以使用以下示例代码来完成您想要的事情。

from sklearn.preprocessing import StandardScaler

df = pd.DataFrame({
    'group':[1,1,1,1,2,2,2,2],
    'value':[1,2,3,4,5,6,9,11],
    'value2':[2,3,3,2,10,8,11,10]
})
df[['value', 'value2']] = df.groupby('group').transform(lambda x: StandardScaler().fit_transform(x.values[:,np.newaxis]).ravel())

组	值	2
1	-1.34164	-1
1	-0.447214	1
1	0.447214	1
1	1.34164	-1
2	-1.15311	0.229416
2	-0.733799	-1.60591
2	0.524142	1.14708
2	1.36294.2.3662941.3662941.36142

Thank to this answer, you can do what you want with the below example code.

from sklearn.preprocessing import StandardScaler

df = pd.DataFrame({
    'group':[1,1,1,1,2,2,2,2],
    'value':[1,2,3,4,5,6,9,11],
    'value2':[2,3,3,2,10,8,11,10]
})
df[['value', 'value2']] = df.groupby('group').transform(lambda x: StandardScaler().fit_transform(x.values[:,np.newaxis]).ravel())