如何绘制比较两个数据框的散点图？

发布于 2025-01-16 17:04:19 字数 1363 浏览 0 评论 0原文

我有两个单独的数据框，它们都包含降雨量和与其相对应的日期。

df1:

                 time     tp
0    2013-01-01 00:00:00  0.0
1    2013-01-01 01:00:00  0.0
2    2013-01-01 02:00:00  0.0
3    2013-01-01 03:00:00  0.0
4    2013-01-01 04:00:00  0.0
                 ...  ...
8755 2013-12-31 19:00:00  0.0
8756 2013-12-31 20:00:00  0.0
8757 2013-12-31 21:00:00  0.0
8758 2013-12-31 22:00:00  0.0
8759 2013-12-31 23:00:00  0.0

[8760 rows x 2 columns]

df2:

                 time         tp
0     2013-07-18T18:00:01  0.002794
1     2013-07-18T20:00:00  0.002794
2     2013-07-18T21:00:00  0.002794
3     2013-07-18T22:00:00  0.002794
4     2013-07-19T00:00:00  0.000000
                  ...       ...
9656  2013-12-30T13:30:00  0.000000
9657  2013-12-30T23:30:00  0.000000
9658  2013-12-31T00:00:00  0.000000
9659  2013-12-31T00:00:00  0.000000
9660  2014-01-01T00:00:00  0.000000

[9661 rows x 2 columns]

我正在尝试绘制比较两个数据框的散点图。我这样做的方法是选择一个特定的日期和时间，并在一个轴上绘制 df1 tp ，在另一轴上绘制 df2 tp 。

例如，

如果两个数据帧上的日期/时间 = 2013-12-31 19:00:00，则将 df1 的 tp 绘制在 x 轴上，将 df2 的 tp 绘制在 y 轴上。

为了解决这个问题，我尝试使用以下内容：

df1['dates_match'] = np.where(df1['time'] == df2['time'], 'True', 'False')

它将告诉我日期是否匹配，如果匹配，我可以绘制。问题的出现是因为每个数据帧上的行数不同，并且大多数方法只允许比较具有完全相同行数的数据帧。

有谁知道我可以用来绘制图表的替代方法？

提前致谢！

原文

I have two separate DataFrames, which both contain rainfall amounts and dates corresponding to them.

df1:

                 time     tp
0    2013-01-01 00:00:00  0.0
1    2013-01-01 01:00:00  0.0
2    2013-01-01 02:00:00  0.0
3    2013-01-01 03:00:00  0.0
4    2013-01-01 04:00:00  0.0
                 ...  ...
8755 2013-12-31 19:00:00  0.0
8756 2013-12-31 20:00:00  0.0
8757 2013-12-31 21:00:00  0.0
8758 2013-12-31 22:00:00  0.0
8759 2013-12-31 23:00:00  0.0

[8760 rows x 2 columns]

df2:

                 time         tp
0     2013-07-18T18:00:01  0.002794
1     2013-07-18T20:00:00  0.002794
2     2013-07-18T21:00:00  0.002794
3     2013-07-18T22:00:00  0.002794
4     2013-07-19T00:00:00  0.000000
                  ...       ...
9656  2013-12-30T13:30:00  0.000000
9657  2013-12-30T23:30:00  0.000000
9658  2013-12-31T00:00:00  0.000000
9659  2013-12-31T00:00:00  0.000000
9660  2014-01-01T00:00:00  0.000000

[9661 rows x 2 columns]

I'm trying to plot a scatter graph comparing the two data frames. The way I'm doing it is by choosing a specific date and time and plotting the df1 tp on one axis and df2 tp on the other axis.

For example,

If the date/time on both dataframes = 2013-12-31 19:00:00, then plot tp for df1 onto x-axis, and tp for df2 on the y-axis.

To solve this, I tried using the following:

df1['dates_match'] = np.where(df1['time'] == df2['time'], 'True', 'False')

which will tell me if the dates match, and if they do I can plot. The problem arises as I have a different number of rows on each dataframe, and most methods only allow comparison of dataframes with exactly the same amount of rows.

Does anyone know of an alternative method I could use to plot the graph?

Thanks in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

天暗了我发光 2025-01-23 17:04:19

主要目标是绘制两个时间序列，这两个时间序列显然没有相同的频率来比较它们。

由于这里的主要问题是不同的时间戳，让我们通过 pandas 重新采样来解决这个问题，这样我们就可以为每个观察提供更统一的时间戳。要获取 30 分钟间隔的总和（如果需要，可以随意更改时间间隔和 agg 函数）

df1.set_index("time", inplace=True)
df2.set_index("time", inplace=True)

df1_resampled = df1.resample("30T").sum() # taking the sum of 30 minutes intervals
df2_resampled = df2.resample("30T").sum() # taking the sum of 30 minutes intervals

现在时间戳更有组织性，如果需要，您可以合并较新的重采样数据帧，然后绘制i

df_joined = df1_resampled.join(df2_resampled, lsuffix="_1", rsuffix="_2")
df_joined.plot(marker="o", figsize=(12,6))
# df_joined.plot(subplots=True) if you want to plot them separately

由于 df1 于 2013-01-01 开始，df2 于 2013-07-18 开始，如果您只想绘图，您将有第一个时期，其中只有 df1 存在连接两个数据帧时可以传递 how="outer" 的重叠时间段。

The main goal is to plot two time series with that apparently don't have the same frequency to be able to compare them.

Since the main issue here is the different timestamps let's tackle that with pandas resample so we have a more uniform timestamps for each observation. To take the sum of 30 minutes intervals you can do (feel free to change the time interval and the agg function if you want to)

df1.set_index("time", inplace=True)
df2.set_index("time", inplace=True)

df1_resampled = df1.resample("30T").sum() # taking the sum of 30 minutes intervals
df2_resampled = df2.resample("30T").sum() # taking the sum of 30 minutes intervals

Now that the timestamps are more organized you can either merge the newer resampled dataframes if you want to and then plot i

df_joined = df1_resampled.join(df2_resampled, lsuffix="_1", rsuffix="_2")
df_joined.plot(marker="o", figsize=(12,6))
# df_joined.plot(subplots=True) if you want to plot them separately

Since df1 starts on 2013-01-01 and df2 on 2013-07-18 you'll have a first period where only df1 will exist if you want to plot only the overlapped period you can pass how="outer" to when joining both dataframes.

回复收藏 0 原文

~没有更多了~