熊猫散点图由多行条件和x，y列值滤波

发布于 2025-02-12 11:06:06 字数 2049 浏览 0 评论 0原文

感谢您的想法 - 我一直在尝试使用循环进行散点图，以过滤X值（列数据）和Y值（列数据）的唯一行值（2）行值。当满足2行条件时，将制作散点图的列数据。我的数据看起来像这样：

site_name   power_1   wind_speed   month   year   day   hour   power_2
A           50        5.5          1       2021   2     5      60
A           75        5.9          2       2021   8    17      70
A           40        7.3          5       2021  11    20      85
B           80        8.1          4       2021   1     4      90
B           84        8.2          7       2021  18     5      92
B           46        6.1         10       2021  23    11      41

我试图在带有x =风速的单独散点图中绘制每个站点，y = power_1，每个小时都有不同的颜色。最终，我需要2个散点图（a，b）才能进行风速和功率，然后为x，y值3个不同的颜色点。我希望这是有道理的。

我尝试使用2循环结构-1个位点（a，b）的外循环和x，y值的颜色的内环。

我的实际代码比上面显示的大得多的数据集类似于下面，当我使用此数据时，我会得到一个空白图：

#PLOT ALL HOURS OF THE MONTHS/YEARS - WIND SPEED vs  POWER
sites = (dfc1.plant_name.unique())
sites = sites.tolist()
import matplotlib.patches
from scipy.interpolate import interp1d
levels, categories = pd.factorize(dfc1.hour.unique())
colors = [plt.cm.Paired(k) for k in levels] 
handles = [matplotlib.patches.Patch(color=plt.cm.Paired(k), label=c) for k, c in enumerate(categories)]
#fig, ax = plt.subplots(figsize=(10,4))

for i in range(len(sites)):
    #fig = plt.figure()
    for j in np.arange(0,24): #24 HOURS AND 1 COLOR FOR EACH UNIQUE HOUR
        
        x = dfc1.loc[dfc1['plant_name']==sites[i]].groupby(['hour']).wind_speed_ms
        y = dfc1.loc[dfc1['plant_name']==sites[i]].groupby(['hour']).power_kwh
        plt.scatter(x,y, edgecolors=colors[0:j],marker='o',facecolors='none')
        site = str(sites[i])
        plt.title(site + (' ')  + str(dfc1.columns[5]) + (' ') + ('vs') + (' ') + str(dfc1.columns[3]) )
        plt.xlabel('Wind Speed'); plt.ylabel('Power')
        plt.legend(handles=handles, title="Month",loc='center left', bbox_to_anchor=(1,0.5),edgecolor='black')
    #plt.plot(mwsvar.iloc[-1,4], mpvar.iloc[-1,4], c='orange',linestyle=(0,()),marker="o",markersize=7)
    plt.legend()
    plt.show()

原文

Thank you for your ideas - I have been trying to make a scatter plot using a loop to filter for unique (2) row values for x values (column data) and y values (column data). The column data for the scatter plot is made when the 2 row conditions are met. My data looks like this:

site_name   power_1   wind_speed   month   year   day   hour   power_2
A           50        5.5          1       2021   2     5      60
A           75        5.9          2       2021   8    17      70
A           40        7.3          5       2021  11    20      85
B           80        8.1          4       2021   1     4      90
B           84        8.2          7       2021  18     5      92
B           46        6.1         10       2021  23    11      41

I am trying to plot each site in a separate scatter plot with x = wind speed and y = power_1 and each hour a different color. Ultimately, I need 2 scatter plots (A, B) for wind speed and power and then 3 different color points for the x, y values. I hope this makes sense.

I have tried using a 2-loop structure - 1 outer loop for the sites (A, B) and an inner loop for the colors of the x, y values.

My actual code to a much larger dataset than I show above resembles below and I get a blank plot when I use this:

#PLOT ALL HOURS OF THE MONTHS/YEARS - WIND SPEED vs  POWER
sites = (dfc1.plant_name.unique())
sites = sites.tolist()
import matplotlib.patches
from scipy.interpolate import interp1d
levels, categories = pd.factorize(dfc1.hour.unique())
colors = [plt.cm.Paired(k) for k in levels] 
handles = [matplotlib.patches.Patch(color=plt.cm.Paired(k), label=c) for k, c in enumerate(categories)]
#fig, ax = plt.subplots(figsize=(10,4))

for i in range(len(sites)):
    #fig = plt.figure()
    for j in np.arange(0,24): #24 HOURS AND 1 COLOR FOR EACH UNIQUE HOUR
        
        x = dfc1.loc[dfc1['plant_name']==sites[i]].groupby(['hour']).wind_speed_ms
        y = dfc1.loc[dfc1['plant_name']==sites[i]].groupby(['hour']).power_kwh
        plt.scatter(x,y, edgecolors=colors[0:j],marker='o',facecolors='none')
        site = str(sites[i])
        plt.title(site + (' ')  + str(dfc1.columns[5]) + (' ') + ('vs') + (' ') + str(dfc1.columns[3]) )
        plt.xlabel('Wind Speed'); plt.ylabel('Power')
        plt.legend(handles=handles, title="Month",loc='center left', bbox_to_anchor=(1,0.5),edgecolor='black')
    #plt.plot(mwsvar.iloc[-1,4], mpvar.iloc[-1,4], c='orange',linestyle=(0,()),marker="o",markersize=7)
    plt.legend()
    plt.show()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

守望孤独 2025-02-19 11:06:07

我认为您非常接近，这是使用Matplotlib的解决方案，它有点笨拙，但我认为这是正确的解决方案。然后，我还使用一个称为Seaborn的不同库显示，该图可以使类似地块变得更加容易

import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'site_name': ['A', 'A', 'A', 'B', 'B', 'B'],
    'power_1': [50, 75, 40, 80, 84, 46],
    'wind_speed': [5.5, 5.9, 7.3, 8.1, 8.2, 6.1],
    'month': [1, 2, 5, 4, 7, 10],
    'year': [2021, 2021, 2021, 2021, 2021, 2021],
    'day': [2, 8, 11, 1, 18, 23],
    'hour': [5, 17, 20, 4, 5, 11],
    'power_2': [60, 70, 85, 90, 92, 41],
})

#Matplotlib approach
cmap = mpl.cm.get_cmap('Blues')
hour_colors = {h+1:cmap(h/24) for h in range(24)} #different color for each hour

for site_name,site_df in df.groupby('site_name'):
    fig, ax = plt.subplots()
    for hour,hour_df in site_df.groupby('hour'):
        x = hour_df['wind_speed']
        y = hour_df['power_1']
        color = hour_colors[hour]
        ax.scatter(x, y, color=color, label=f'Hour {hour}')
        
    ax.legend()
    plt.title(f'Station {site_name}')
    plt.xlabel('Wind speed')
    plt.ylabel('Power 1')
    plt.show()
    plt.close()

#Seaborn approach (different library)
import seaborn as sns
sns.relplot(
    x = 'wind_speed',
    y = 'power_1',
    col = 'site_name',
    hue = 'hour',
    data = df,
)
plt.show()
plt.close()

I think you're very close, here's a solution using matplotlib which is kind of long and unwieldy but I think it's the correct solution. Then I also show using a different library called seaborn which makes plots like this much easier

import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'site_name': ['A', 'A', 'A', 'B', 'B', 'B'],
    'power_1': [50, 75, 40, 80, 84, 46],
    'wind_speed': [5.5, 5.9, 7.3, 8.1, 8.2, 6.1],
    'month': [1, 2, 5, 4, 7, 10],
    'year': [2021, 2021, 2021, 2021, 2021, 2021],
    'day': [2, 8, 11, 1, 18, 23],
    'hour': [5, 17, 20, 4, 5, 11],
    'power_2': [60, 70, 85, 90, 92, 41],
})

#Matplotlib approach
cmap = mpl.cm.get_cmap('Blues')
hour_colors = {h+1:cmap(h/24) for h in range(24)} #different color for each hour

for site_name,site_df in df.groupby('site_name'):
    fig, ax = plt.subplots()
    for hour,hour_df in site_df.groupby('hour'):
        x = hour_df['wind_speed']
        y = hour_df['power_1']
        color = hour_colors[hour]
        ax.scatter(x, y, color=color, label=f'Hour {hour}')
        
    ax.legend()
    plt.title(f'Station {site_name}')
    plt.xlabel('Wind speed')
    plt.ylabel('Power 1')
    plt.show()
    plt.close()

#Seaborn approach (different library)
import seaborn as sns
sns.relplot(
    x = 'wind_speed',
    y = 'power_1',
    col = 'site_name',
    hue = 'hour',
    data = df,
)
plt.show()
plt.close()