对 numpy 数组切片进行采样的最快方法是什么?
我有一个 3D (time, X, Y) numpy 数组,其中包含几年来的 6 个小时时间序列。 (说5)。我想创建一个采样时间序列,其中包含从可用记录中随机抽取的每个日历日的 1 个实例(每天 5 种可能性),如下所示。
- Jan 01: 2006
- Jan 02: 2011
- Jan 03: 2009
- ...
这意味着我需要从 01/01/2006 获取 4 个值,从 02/01/2011 获取 4 个值,等等。 我有一个工作版本,其工作原理如下:
- 重塑输入数组以添加“年”维度(时间、年份、X、Y)
- 创建一个由 0 到 4 之间随机生成的整数组成的 365 个值数组
- 使用 np.repeat 和数组整数仅提取相关值:
示例:
sampledValues = Variable[np.arange(numberOfDays * ValuesPerDays), sampledYears.repeat(ValuesPerDays),:,:]
这似乎可行,但我想知道这是否是解决我的问题的最佳/最快方法?速度很重要,因为我是在循环中执行此操作,并且测试尽可能多的案例将受益。
我这样做对吗?
谢谢
编辑 我忘了提及,我过滤了输入数据集,删除了闰年的 2 月 29 日。
基本上,该操作的目的是找到一个在均值等方面与长期时间序列良好匹配的 365 天样本。如果采样时间序列通过了我的质量测试,我想将其导出并重新开始。
I have a 3D (time, X, Y) numpy array containing 6 hourly time series for a few years. (say 5). I would like to create a sampled time series containing 1 instance of each calendar day randomly taken from the available records (5 possibilities per day), as follows.
- Jan 01: 2006
- Jan 02: 2011
- Jan 03: 2009
- ...
this means I need to take 4 values from 01/01/2006, 4 values from 02/01/2011, etc.
I have a working version which works as follows:
- Reshape the input array to add a "year" dimension (Time, Year, X, Y)
- Create a 365 values array of randomly generated integers between 0 and 4
- Use np.repeat and array of integers to extract only the relevant values:
Example:
sampledValues = Variable[np.arange(numberOfDays * ValuesPerDays), sampledYears.repeat(ValuesPerDays),:,:]
This seems to work, but I was wondering if this is the best/fastest approach to solve my problem? Speed is important as I am doing this in a loop, adn would benefit from testing as many cases as possible.
Am I doing this right?
Thanks
EDIT
I forgot to mention that I filtered the input dataset to remove the 29th of feb for leap years.
Basically the aim of that operation is to find a 365 days sample that matches well the long term time series in terms on mean etc. If the sampled time series passes my quality test, I want to export it and start again.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
2008年有366天,所以不要重塑。
看看 scikits.timeseries:
现在您可以访问带有日期的
t
数据/month/year 对象:例如:
使用
t
的属性使其也适用于闰年。The year 2008 was 366 days long, so don't reshape.
Have a look at scikits.timeseries:
Now you can access the
t
data with day/month/year objects:so for example:
Play with the attributes of
t
to make it work with leap years too.我认为没有真正需要重塑数组,因为您可以在采样过程中嵌入年份大小信息,并使数组保持其原始形状。
例如,您可以生成一个随机偏移量(从 0 到 365),然后选择索引为
n*365 + offset
的切片。无论如何,我认为你的问题并不完整,因为我不太明白你需要做什么,或者为什么。
I don't see a real need to reshape the array, since you can embed the year-size information in your sampling process, and leave the array with its original shape.
For example, you can generate a random offset (from 0 to 365), and pick the slice with index, say,
n*365 + offset
.Anyway, I don't think your question is complete, because I didn't quite understand what you need to do, or why.