对 numpy 数组切片进行采样的最快方法是什么？

发布于 2024-12-11 08:14:03 字数 805 浏览 6 评论 0原文

我有一个 3D (time, X, Y) numpy 数组，其中包含几年来的 6 个小时时间序列。（说5）。我想创建一个采样时间序列，其中包含从可用记录中随机抽取的每个日历日的 1 个实例（每天 5 种可能性），如下所示。

Jan 01: 2006
Jan 02: 2011
Jan 03: 2009
...

这意味着我需要从 01/01/2006 获取 4 个值，从 02/01/2011 获取 4 个值，等等。我有一个工作版本，其工作原理如下：

重塑输入数组以添加“年”维度（时间、年份、X、Y）
创建一个由 0 到 4 之间随机生成的整数组成的 365 个值数组
使用 np.repeat 和数组整数仅提取相关值：

示例：

sampledValues = Variable[np.arange(numberOfDays * ValuesPerDays), sampledYears.repeat(ValuesPerDays),:,:]

这似乎可行，但我想知道这是否是解决我的问题的最佳/最快方法？速度很重要，因为我是在循环中执行此操作，并且测试尽可能多的案例将受益。

我这样做对吗？

谢谢

编辑我忘了提及，我过滤了输入数据集，删除了闰年的 2 月 29 日。

基本上，该操作的目的是找到一个在均值等方面与长期时间序列良好匹配的 365 天样本。如果采样时间序列通过了我的质量测试，我想将其导出并重新开始。

原文

I have a 3D (time, X, Y) numpy array containing 6 hourly time series for a few years. (say 5). I would like to create a sampled time series containing 1 instance of each calendar day randomly taken from the available records (5 possibilities per day), as follows.

Jan 01: 2006
Jan 02: 2011
Jan 03: 2009
...

this means I need to take 4 values from 01/01/2006, 4 values from 02/01/2011, etc.
I have a working version which works as follows:

Reshape the input array to add a "year" dimension (Time, Year, X, Y)
Create a 365 values array of randomly generated integers between 0 and 4
Use np.repeat and array of integers to extract only the relevant values:

Example:

sampledValues = Variable[np.arange(numberOfDays * ValuesPerDays), sampledYears.repeat(ValuesPerDays),:,:]

This seems to work, but I was wondering if this is the best/fastest approach to solve my problem? Speed is important as I am doing this in a loop, adn would benefit from testing as many cases as possible.

Am I doing this right?

Thanks

EDIT
I forgot to mention that I filtered the input dataset to remove the 29th of feb for leap years.

Basically the aim of that operation is to find a 365 days sample that matches well the long term time series in terms on mean etc. If the sampled time series passes my quality test, I want to export it and start again.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小梨窩很甜 2024-12-18 08:14:03

2008年有366天，所以不要重塑。

看看 scikits.timeseries：

import scikits.timeseries as ts

start_date = ts.Date('H', '2006-01-01 00:00')
end_date = ts.Date('H', '2010-12-31 18:00')
arr3d = ... # your 3D array [time, X, Y]

dates = ts.date_array(start_date=start_date, end_date=end_date, freq='H')[::6]
t = ts.time_series(arr3d, dates=dates)
# just make sure arr3d.shape[0] == len(dates) !

现在您可以访问带有日期的 t 数据/month/year 对象：

t[np.logical_and(t.day == 1, t.month == 1)]

例如：

for day_of_year in xrange(1, 366):
    year = np.random.randint(2006, 2011)

    t[np.logical_and(t.day_of_year == day_of_year, t.year == year)]
    # returns a [4, X, Y] array with data from that day

使用 t 的属性使其也适用于闰年。

The year 2008 was 366 days long, so don't reshape.

Have a look at scikits.timeseries:

import scikits.timeseries as ts

start_date = ts.Date('H', '2006-01-01 00:00')
end_date = ts.Date('H', '2010-12-31 18:00')
arr3d = ... # your 3D array [time, X, Y]

dates = ts.date_array(start_date=start_date, end_date=end_date, freq='H')[::6]
t = ts.time_series(arr3d, dates=dates)
# just make sure arr3d.shape[0] == len(dates) !

Now you can access the t data with day/month/year objects:

t[np.logical_and(t.day == 1, t.month == 1)]

so for example:

for day_of_year in xrange(1, 366):
    year = np.random.randint(2006, 2011)

    t[np.logical_and(t.day_of_year == day_of_year, t.year == year)]
    # returns a [4, X, Y] array with data from that day

Play with the attributes of t to make it work with leap years too.

回复收藏 0 原文