python numpy中的日期数据插值

发布于 2025-01-20 18:16:23 字数 577 浏览 1 评论 0原文

我有一个由两个列组成的.CSV，我将其导入为numpy数组。第一列是日期数据数据，每个月都有一块数据。第二列是该月的相应值。

我想插入数据，以便每天为每天创建新的Datatime行，并为每天的相应值创建。如果可能的话，我还想为插值值引入一些随机噪音，但我知道这是很多要问的。

这是一个数据示例：

Date,Value
01/06/2010 00:00,42.18
01/07/2010 00:00,43.53
01/08/2010 00:00,39.95
01/09/2010 00:00,41.12
01/10/2010 00:00,43.5
01/11/2010 00:00,46.4
01/12/2010 00:00,58.03
01/01/2011 00:00,48.43
01/02/2011 00:00,46.47
01/03/2011 00:00,51.41
01/04/2011 00:00,50.88
01/05/2011 00:00,50.27
01/06/2011 00:00,50.82

非常感谢您的帮助 - 我知道scipy.interpaly，但不确定这是否可以适用于DateTime格式？

原文

I have a .csv consisting of two columns, which I have imported as a numpy array. The first column is datetime data with one piece of data every month. The second column is the corresponding value for that month.

I want to interpolate the data so as to create new datatime rows for every day and also a corresponding value for each day too. If possible, I would also like to introduce some random noise for the interpolated values, but I know this is a lot to ask.

Here is a sample of the data:

Date,Value
01/06/2010 00:00,42.18
01/07/2010 00:00,43.53
01/08/2010 00:00,39.95
01/09/2010 00:00,41.12
01/10/2010 00:00,43.5
01/11/2010 00:00,46.4
01/12/2010 00:00,58.03
01/01/2011 00:00,48.43
01/02/2011 00:00,46.47
01/03/2011 00:00,51.41
01/04/2011 00:00,50.88
01/05/2011 00:00,50.27
01/06/2011 00:00,50.82

Thanks very much for your help - I know of scipy.interpolate, but not sure if this can work with datetime format or not?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

迷离° 2025-01-27 18:16:23

假设您的 Date 列已排序并包含字符串（不是日期）值，并且您的 Values 列包含浮点数，这是一种获取第一天之间每一天的插值的方法和最后一个日期，假设采用 DD/MM/YYYY 格式：

import datetime as dt

df[["Date", "Time"]] = df["Date"].str.split(' ', expand=True)
df[["Day", "Month", "Year"]] = df["Date"].str.split('/', expand=True)

first_date = np.array([int(df["Day"].iloc[0]), int(df["Month"].iloc[0]), int(df["Year"].iloc[0])]).flatten()

# I'm trying to get the number of days between date entries so I can turn each date
# into a float with the number being how many days since the 1st day.

col1 = df["Date"].iloc[0:len(df) - 1]
col2 = df["Date"].iloc[1:]

col1 = pd.to_datetime(col1, format='%d/%m/%Y').reset_index()
col2 = pd.to_datetime(col2, format='%d/%m/%Y').reset_index()

# Finding the difference and adding a row at the beginning with 0 days because
# diff is 1 row short; it does not have a value for the 1st date, which should be
# 0 days since the 1st date.

diff = col2 - col1
diff = diff["Date"].dt.days.cumsum()
diff = pd.concat([pd.DataFrame([0]), diff], axis=0).reset_index().drop(["index"], axis=1)

# Original_x are the dates in float format.
original_x = diff.to_numpy().flatten()
final_x_vals = np.arange(0, original_x[-1] + 1, 1)
original_y = df["Value"].to_numpy().astype(float)

final_y_vals = np.interp(final_x_vals, original_x, original_y)

# Function to turn the final_x_vals (i.e. interpolated dates) back to dates.
def num_to_date(nums, first_date):
  first_day, first_month, first_year = first_date
  first_date = dt.datetime(first_year, first_month, first_day, 0,0)
  
  dates = []
  for n in nums:
    new_date = first_date + dt.timedelta(days = int(n))
    dates.append(new_date)

  return dates

final_dates = num_to_date(final_x_vals, first_date)

# df with interpolated values.
new_df = pd.DataFrame(list(map(list, zip(*[final_dates, final_y_vals]))), columns=["Date", "Value"])

这非常麻烦，我确信有一种更有效的方法，但它达到了目的。如果您有任何疑问，请告诉我。

Assuming your Date column is sorted and contains string (not date) values and your Values column contains floats, this is a way to get the interpolated value for every day between the first and last date, assuming DD/MM/YYYY format:

import datetime as dt

df[["Date", "Time"]] = df["Date"].str.split(' ', expand=True)
df[["Day", "Month", "Year"]] = df["Date"].str.split('/', expand=True)

first_date = np.array([int(df["Day"].iloc[0]), int(df["Month"].iloc[0]), int(df["Year"].iloc[0])]).flatten()

# I'm trying to get the number of days between date entries so I can turn each date
# into a float with the number being how many days since the 1st day.

col1 = df["Date"].iloc[0:len(df) - 1]
col2 = df["Date"].iloc[1:]

col1 = pd.to_datetime(col1, format='%d/%m/%Y').reset_index()
col2 = pd.to_datetime(col2, format='%d/%m/%Y').reset_index()

# Finding the difference and adding a row at the beginning with 0 days because
# diff is 1 row short; it does not have a value for the 1st date, which should be
# 0 days since the 1st date.

diff = col2 - col1
diff = diff["Date"].dt.days.cumsum()
diff = pd.concat([pd.DataFrame([0]), diff], axis=0).reset_index().drop(["index"], axis=1)

# Original_x are the dates in float format.
original_x = diff.to_numpy().flatten()
final_x_vals = np.arange(0, original_x[-1] + 1, 1)
original_y = df["Value"].to_numpy().astype(float)

final_y_vals = np.interp(final_x_vals, original_x, original_y)

# Function to turn the final_x_vals (i.e. interpolated dates) back to dates.
def num_to_date(nums, first_date):
  first_day, first_month, first_year = first_date
  first_date = dt.datetime(first_year, first_month, first_day, 0,0)
  
  dates = []
  for n in nums:
    new_date = first_date + dt.timedelta(days = int(n))
    dates.append(new_date)

  return dates

final_dates = num_to_date(final_x_vals, first_date)

# df with interpolated values.
new_df = pd.DataFrame(list(map(list, zip(*[final_dates, final_y_vals]))), columns=["Date", "Value"])

It's very cumbersome and I'm sure there's a more efficient way, but it serves the purpose. Let me know if you have any questions.

回复收藏 0 原文

~没有更多了~