在多个散点子图上添加线性回归线,颜色不同地取决于斜率为正或负数
因此,我在如何检索X和y的情况下呼叫polyfit功能完全迷失了方向。
绘制线性回归,如果斜率为负,则可以说是红色,或者如果呈绿色,则是绿色的。
我绘制该数字的代码是:
def show_monthly_temp(tmax):
tmax_grouped_avg = tmax.groupby(tmax.index.strftime("%m/%Y")).mean()
tmax_grouped_avg['datetime'] = pd.to_datetime(tmax_grouped_avg.index)
tmax_grouped_avg['Year'] = tmax_grouped_avg['datetime'].dt.year
groups = tmax_grouped_avg.sort_values('datetime').groupby(tmax_grouped_avg['datetime'].dt.month)
groups_df = pd.DataFrame(groups)
groups_df.to_csv("gaga")
f, axes = plt.subplots(nrows=3, ncols=4, figsize=(12, 6))
for (grp_id, grp_df), ax in zip(groups, axes.ravel()):
print(grp_id)
grp_df.plot.scatter(ax=ax, x='Year', y='TMAX', title=f'{calendar.month_name[grp_id]}', legend=False,
sharey=False, sharex=False)
plt.suptitle('Maximum temperature for each month')
plt.tight_layout()
plt.show()
并且以防万一,整个代码是:
import calendar
import pandas as pd
from datetime import datetime
import numpy as np
import scipy as stats
import matplotlib.pyplot as plt
def show_monthly_temp(tmax):
tmax_grouped_avg = tmax.groupby(tmax.index.strftime("%m/%Y")).mean()
tmax_grouped_avg['datetime'] = pd.to_datetime(tmax_grouped_avg.index)
tmax_grouped_avg['Year'] = tmax_grouped_avg['datetime'].dt.year
groups = tmax_grouped_avg.sort_values('datetime').groupby(tmax_grouped_avg['datetime'].dt.month)
groups_df = pd.DataFrame(groups)
groups_df.to_csv("gaga")
f, axes = plt.subplots(nrows=3, ncols=4, figsize=(12, 6))
for (grp_id, grp_df), ax in zip(groups, axes.ravel()):
print(grp_id)
grp_df.plot.scatter(ax=ax, x='Year', y='TMAX', title=f'{calendar.month_name[grp_id]}', legend=False,
sharey=False, sharex=False)
plt.suptitle('Maximum temperature for each month')
plt.tight_layout()
plt.show()
def show_snow_days(snow):
# transform to a list
monthsDataFrames = [snow[snow.apply(lambda d: d.index.month == month)].dropna() for month in range(1, 13)]
ax = plt.subplot(111)
for i in range(len(monthsDataFrames)):
ax.boxplot(monthsDataFrames[i].values, positions=[i])
plt.xticks([i for i in range(12)], [str(month) for month in range(1, 13)])
plt.show()
if __name__ == '__main__':
df = pd.read_csv("2961941.csv")
# set date column as index, drop the 'DATE' column to avoid repititions + create as datetime object
# speed up parsing using infer_datetime_format=True.
df.index = pd.to_datetime(df['DATE'], infer_datetime_format=True)
# create new tables
tmax = df.filter(['TMAX'], axis=1).dropna()
snow = df.filter(['SNOW']).dropna()
# count number of snow day samples - make sure at least >= 28
snow_grouped = snow.groupby(pd.Grouper(level='DATE', freq="M")).transform('count')
snow = (snow[snow_grouped['SNOW'] >= 28])
# count number of tmax day samples - make sure at least >= 28
tmax_grouped = tmax.groupby(pd.Grouper(level='DATE', freq="M")).transform('count')
tmax = (tmax[tmax_grouped['TMAX'] >= 28])
################ Until here - initialized data ###############
show_monthly_temp(tmax)
show_snow_days(snow)
谢谢! :-)
So, I'm quite lost in how to retrieve my x and y for calling the polyfit function.
as the question states, my attempt is on each subplot my code produces:
to draw a linear regression, having, let's say, red colour if the slope is negative, or green if positive.
my code for drawing the figure is:
def show_monthly_temp(tmax):
tmax_grouped_avg = tmax.groupby(tmax.index.strftime("%m/%Y")).mean()
tmax_grouped_avg['datetime'] = pd.to_datetime(tmax_grouped_avg.index)
tmax_grouped_avg['Year'] = tmax_grouped_avg['datetime'].dt.year
groups = tmax_grouped_avg.sort_values('datetime').groupby(tmax_grouped_avg['datetime'].dt.month)
groups_df = pd.DataFrame(groups)
groups_df.to_csv("gaga")
f, axes = plt.subplots(nrows=3, ncols=4, figsize=(12, 6))
for (grp_id, grp_df), ax in zip(groups, axes.ravel()):
print(grp_id)
grp_df.plot.scatter(ax=ax, x='Year', y='TMAX', title=f'{calendar.month_name[grp_id]}', legend=False,
sharey=False, sharex=False)
plt.suptitle('Maximum temperature for each month')
plt.tight_layout()
plt.show()
and just in case, the whole code is:
import calendar
import pandas as pd
from datetime import datetime
import numpy as np
import scipy as stats
import matplotlib.pyplot as plt
def show_monthly_temp(tmax):
tmax_grouped_avg = tmax.groupby(tmax.index.strftime("%m/%Y")).mean()
tmax_grouped_avg['datetime'] = pd.to_datetime(tmax_grouped_avg.index)
tmax_grouped_avg['Year'] = tmax_grouped_avg['datetime'].dt.year
groups = tmax_grouped_avg.sort_values('datetime').groupby(tmax_grouped_avg['datetime'].dt.month)
groups_df = pd.DataFrame(groups)
groups_df.to_csv("gaga")
f, axes = plt.subplots(nrows=3, ncols=4, figsize=(12, 6))
for (grp_id, grp_df), ax in zip(groups, axes.ravel()):
print(grp_id)
grp_df.plot.scatter(ax=ax, x='Year', y='TMAX', title=f'{calendar.month_name[grp_id]}', legend=False,
sharey=False, sharex=False)
plt.suptitle('Maximum temperature for each month')
plt.tight_layout()
plt.show()
def show_snow_days(snow):
# transform to a list
monthsDataFrames = [snow[snow.apply(lambda d: d.index.month == month)].dropna() for month in range(1, 13)]
ax = plt.subplot(111)
for i in range(len(monthsDataFrames)):
ax.boxplot(monthsDataFrames[i].values, positions=[i])
plt.xticks([i for i in range(12)], [str(month) for month in range(1, 13)])
plt.show()
if __name__ == '__main__':
df = pd.read_csv("2961941.csv")
# set date column as index, drop the 'DATE' column to avoid repititions + create as datetime object
# speed up parsing using infer_datetime_format=True.
df.index = pd.to_datetime(df['DATE'], infer_datetime_format=True)
# create new tables
tmax = df.filter(['TMAX'], axis=1).dropna()
snow = df.filter(['SNOW']).dropna()
# count number of snow day samples - make sure at least >= 28
snow_grouped = snow.groupby(pd.Grouper(level='DATE', freq="M")).transform('count')
snow = (snow[snow_grouped['SNOW'] >= 28])
# count number of tmax day samples - make sure at least >= 28
tmax_grouped = tmax.groupby(pd.Grouper(level='DATE', freq="M")).transform('count')
tmax = (tmax[tmax_grouped['TMAX'] >= 28])
################ Until here - initialized data ###############
show_monthly_temp(tmax)
show_snow_days(snow)
thanks! :-)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为了计算线性回归,我使用了“ Sklearn”库。
其中X轴上的索引和值本身被馈送。
此外,索引预测了目标的值。
如果该线的斜率为负或水平,则将是红色的,否则将是绿色的。我检查了它的数据。将您的数据替换到“ show_monthly_temp”函数中,我离开了DF(别忘了更改它)。
我的数据帧看起来像这样:
如何获取数据框架:
To calculate the linear regression, I used the 'sklearn' library.
In which the indexes on the x axis and the values themselves are fed.
Further, the values of the target are predicted by the indices.
If the line has a negative slope or is horizontal, then it will be red, otherwise green. I checked it works on my data. Substitute your data into the 'show_monthly_temp' function, I left df (don't forget to change it).
my dataframe looks like this:
how do get data frame: