在一定范围内标准化/缩放数据帧

发布于 2025-01-20 03:19:48 字数 946 浏览 3 评论 0 原文

我有以下数据框:

pd.DataFrame({'DateTime': {0: Timestamp('2022-02-08 00:00:00'),
  1: Timestamp('2022-02-08 00:10:00'),
  2: Timestamp('2022-02-08 00:20:00'),
  3: Timestamp('2022-02-08 00:30:00'),
  4: Timestamp('2022-02-08 00:40:00')},
 'wind power [W]': {0: 83.9, 1: 57.2, 2: 58.2, 3: 48.0, 4: 69.5}})
             DateTime  wind power [W]
0 2022-02-08 00:00:00            83.9
1 2022-02-08 00:10:00            57.2
2 2022-02-08 00:20:00            58.2
3 2022-02-08 00:30:00            48.0
4 2022-02-08 00:40:00            69.5

如您所见,83.9 是第二列中的最大值,48.0 是最小值。我想将这些值标准化在 0.68.4 之间的范围内,这样 83.9 就会变成 8.4,48.0 就会变成 0.6。其余的数字将介于两者之间。 到目前为止,我只能使用代码将列规范化为 0-1 的范围:

df['normalized'] = (df['wind power [W]']-df['wind power [W]'].min())/(df['wind power [W]'].max()-df['wind power [W]'].min())

我不知道如何进一步继续将这些数字置于我想要的范围内。有人可以帮我吗?

I have the following Dataframe:

pd.DataFrame({'DateTime': {0: Timestamp('2022-02-08 00:00:00'),
  1: Timestamp('2022-02-08 00:10:00'),
  2: Timestamp('2022-02-08 00:20:00'),
  3: Timestamp('2022-02-08 00:30:00'),
  4: Timestamp('2022-02-08 00:40:00')},
 'wind power [W]': {0: 83.9, 1: 57.2, 2: 58.2, 3: 48.0, 4: 69.5}})
             DateTime  wind power [W]
0 2022-02-08 00:00:00            83.9
1 2022-02-08 00:10:00            57.2
2 2022-02-08 00:20:00            58.2
3 2022-02-08 00:30:00            48.0
4 2022-02-08 00:40:00            69.5

As you can see, 83.9 is the maximum value in my second column and 48.0 the minimum value. I want to normalize these values in a range between 0.6 and 8.4, so that 83.9 would turn to 8.4 and 48.0 to 0.6. The rest of the numbers would fall somewhere in between.
So far I only managed to normalize the column to a range of 0-1 with the code:

df['normalized'] = (df['wind power [W]']-df['wind power [W]'].min())/(df['wind power [W]'].max()-df['wind power [W]'].min())

I don't know how to further proceed to get these numbers in my desired range. Can someone help me, please?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

贪了杯 2025-01-27 03:19:48

我们可以使用 MinMaxScaler 为了执行特征缩放,MinMaxScaler 支持一个名为 feature_range 的参数,它允许我们指定转换数据的所需范围。

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0.6, 8.4))
df['normalized'] = scaler.fit_transform(df['wind power [W]'].values[:, None])

或者,如果您不想使用MinMaxScaler,这是一种仅在 pandas 中缩放数据的方法:

w = df['wind power [W]'].agg(['min', 'max'])
norm = (df['wind power [W]'] - w['min']) / (w['max'] - w['min'])
df['normalized'] = norm * (8.4 - 0.6) + 0.6

print(df)

             DateTime  wind power [W]  normalized
0 2022-02-08 00:00:00            83.9    8.400000
1 2022-02-08 00:10:00            57.2    2.598886
2 2022-02-08 00:20:00            58.2    2.816156
3 2022-02-08 00:30:00            48.0    0.600000
4 2022-02-08 00:40:00            69.5    5.271309

We can use MinMaxScaler to perform feature scaling, MinMaxScaler supports a parameter called feature_range which allows us to specify the desired range of the transformed data

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0.6, 8.4))
df['normalized'] = scaler.fit_transform(df['wind power [W]'].values[:, None])

Alternatively if you don't want to use MinMaxScaler, here is a way scale data in pandas only:

w = df['wind power [W]'].agg(['min', 'max'])
norm = (df['wind power [W]'] - w['min']) / (w['max'] - w['min'])
df['normalized'] = norm * (8.4 - 0.6) + 0.6

print(df)

             DateTime  wind power [W]  normalized
0 2022-02-08 00:00:00            83.9    8.400000
1 2022-02-08 00:10:00            57.2    2.598886
2 2022-02-08 00:20:00            58.2    2.816156
3 2022-02-08 00:30:00            48.0    0.600000
4 2022-02-08 00:40:00            69.5    5.271309
眼眸里的那抹悲凉 2025-01-27 03:19:48

您可以使用

a = 0.6
b = 8.4
x = df['wind power [W]']

df['normalized'] = a + (x - x.min()) * (b - a) / (x.max() - x.min())
print(df)

# Output
             DateTime  wind power [W]  normalized
0 2022-02-08 00:00:00            83.9    8.400000
1 2022-02-08 00:10:00            57.2    2.598886
2 2022-02-08 00:20:00            58.2    2.816156
3 2022-02-08 00:30:00            48.0    0.600000
4 2022-02-08 00:40:00            69.5    5.271309

You can use the wikipedia definition of feature scaling if you don't want to use sklearn:

a = 0.6
b = 8.4
x = df['wind power [W]']

df['normalized'] = a + (x - x.min()) * (b - a) / (x.max() - x.min())
print(df)

# Output
             DateTime  wind power [W]  normalized
0 2022-02-08 00:00:00            83.9    8.400000
1 2022-02-08 00:10:00            57.2    2.598886
2 2022-02-08 00:20:00            58.2    2.816156
3 2022-02-08 00:30:00            48.0    0.600000
4 2022-02-08 00:40:00            69.5    5.271309
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文