如何与Groupby一起使用Sklearn Standardardscaler?

发布于 2025-02-02 06:20:41 字数 1354 浏览 5 评论 0原文

我正在尝试根据日期对大熊猫的数据框架进行标准化。

My dataset looks like this:

datepermnoretcumretmom1mmom3mmom6m
2004-01-30800000.0531.4970.0670.1400.137
2004-02-29800000.0531.4970.0670.1400.137
2004-03-31800000.0531.4970.0670.1400.137
2004-01- 30800010.0531.4970.0670.1400.137
2004-02-29800010.0531.4970.0670.140 0.1400.137
2004-03-31800010.053 1.497 0.0670.0670.1400.1400.137

我试图缩放代码>,MOM6M基于日期。

因此,第一行应用第四行缩放,第二行应用第5行缩放,第三行应用最后一行缩放。

我尝试的是

crsp2[scale_cols] = crsp2.groupby('date')[scale_cols].apply(lambda x: StandardScaler().fit_transform(x))

crsp2是我要缩放的数据框架,而scale_cols是我要扩展的功能列表。

I'm trying to normalize a pandas dataframe while grouping it based on the dates.

My dataset looks like this:

datepermnoretcumretmom1mmom3mmom6m
2004-01-30800000.0531.4970.0670.1400.137
2004-02-29800000.0531.4970.0670.1400.137
2004-03-31800000.0531.4970.0670.1400.137
2004-01-30800010.0531.4970.0670.1400.137
2004-02-29800010.0531.4970.0670.1400.137
2004-03-31800010.0531.4970.0670.1400.137

I'm trying to scale mom1m, mom3m, mom6m based on the dates.

So the first row should be scaled with the 4th row, the second row should be scaled with the 5th row, the third row should be scaled with the last row.

What I've tried is

crsp2[scale_cols] = crsp2.groupby('date')[scale_cols].apply(lambda x: StandardScaler().fit_transform(x))

where crsp2 is the dataframe i'm trying to scale and scale_cols is the list of features I'm trying to scale.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

亢潮 2025-02-09 06:20:41

一个简单的解决方案可以使用 “>标准标准。

您的代码看起来像这样:

from sklearn.preprocessing import scale

# set permno and date as multi-index
crsp2.set_index(keys =["date", "permno"],drop=True)

# columns to scale
scale_cols = ["mom1m","mom3m", "mom6m"]

# apply scaler rankwise
crsp2[scale_cols] = crsp2.groupby('date')[scale_cols].transform(lambda x: scale(x))

输出:

date    permno  ret cumret  mom1m   mom3m   mom6m
0   2004-01-30  80000   0.053   1.497   0.0 0.0 0.0
1   2004-02-29  80000   0.053   1.497   0.0 0.0 0.0
2   2004-03-31  80000   0.053   1.497   0.0 0.0 0.0
3   2004-01-30  80001   0.053   1.497   0.0 0.0 0.0
4   2004-02-29  80001   0.053   1.497   0.0 0.0 0.0
5   2004-03-31  80001   0.053   1.497   0.0 0.0 0.0

A simpler solution could use scale() the pipelined version of the StandardScaler.

Your code would look like this:

from sklearn.preprocessing import scale

# set permno and date as multi-index
crsp2.set_index(keys =["date", "permno"],drop=True)

# columns to scale
scale_cols = ["mom1m","mom3m", "mom6m"]

# apply scaler rankwise
crsp2[scale_cols] = crsp2.groupby('date')[scale_cols].transform(lambda x: scale(x))

Output:

date    permno  ret cumret  mom1m   mom3m   mom6m
0   2004-01-30  80000   0.053   1.497   0.0 0.0 0.0
1   2004-02-29  80000   0.053   1.497   0.0 0.0 0.0
2   2004-03-31  80000   0.053   1.497   0.0 0.0 0.0
3   2004-01-30  80001   0.053   1.497   0.0 0.0 0.0
4   2004-02-29  80001   0.053   1.497   0.0 0.0 0.0
5   2004-03-31  80001   0.053   1.497   0.0 0.0 0.0
初熏 2025-02-09 06:20:41

感谢此答案,您可以使用以下示例代码来完成您想要的事情。

from sklearn.preprocessing import StandardScaler

df = pd.DataFrame({
    'group':[1,1,1,1,2,2,2,2],
    'value':[1,2,3,4,5,6,9,11],
    'value2':[2,3,3,2,10,8,11,10]
})
df[['value', 'value2']] = df.groupby('group').transform(lambda x: StandardScaler().fit_transform(x.values[:,np.newaxis]).ravel())
2
1-1.34164-1
1-0.4472141
10.4472141
11.34164-1
2-1.153110.229416
2-0.733799-1.60591
20.5241421.14708
21.36294.2.3662941.3662941.36142

Thank to this answer, you can do what you want with the below example code.

from sklearn.preprocessing import StandardScaler

df = pd.DataFrame({
    'group':[1,1,1,1,2,2,2,2],
    'value':[1,2,3,4,5,6,9,11],
    'value2':[2,3,3,2,10,8,11,10]
})
df[['value', 'value2']] = df.groupby('group').transform(lambda x: StandardScaler().fit_transform(x.values[:,np.newaxis]).ravel())
groupvaluevalue2
1-1.34164-1
1-0.4472141
10.4472141
11.34164-1
2-1.153110.229416
2-0.733799-1.60591
20.5241421.14708
21.362770.229416
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文