HDF5将数据集标记为其他数据集中的事件
我正在从各种计算机上抽样时间序列数据,并且每个机器通常都需要从另一个设备收集大型高频爆发数据并将其附加到时间序列数据中。
想象一下,我正在测量温度随着时间的流逝,然后每10度升高,我在200kHz处采样了一个微型,我希望能够将大量的微数据标记为时间序列数据中的时间戳。甚至以人物的形式。
我试图用regionRef
来做到这一点,但是正在努力寻找优雅的解决方案。而且我发现自己在熊猫商店和H5py之间在杂耍,这感觉很混乱。
最初,我认为我可以从爆发数据中制作单独的数据集,然后使用参考或时间序列数据中的时间戳链接。但是到目前为止没有运气。
将大量数据引用到另一堆数据中的时间戳的任何方法将不胜感激!
I am sampling time series data off various machines, and every so often need to collect a large high frequency burst of data from another device and append it to the time series data.
Imagine I am measuring temperature over time, and then every 10 degrees increase in temperature I sample a micro at 200khz, I want to be able to tag the large burst of micro data to a timestamp in the time-series data. Maybe even in the form of a figure.
I was trying to do this with regionref
, but am struggling to find a elegant solution. and I'm finding myself juggling between pandas store and h5py and it just feels messy.
Initially I thought I would be able to make separate datasets from the burst-data then use reference or links to timestamps in the time-series data. But no luck so far.
Any way to reference a large packet of data to a timestamp in another pile of data would be appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如何使用区域参考?我假设您有一系列参考文献,参考文献在一系列“标准率”和“爆发率”数据之间交替。这是一种有效的方法,它将起作用。但是,您是正确的:创建是凌乱的,而恢复数据是凌乱的。
虚拟数据集 可能 是一个更优雅的解决方案。...但是跟踪和创建虚拟布局定义也可能会变得混乱。 :-)但是,一旦拥有虚拟数据集,就可以使用典型的切片表示法读取它。 HDF5/H5PY处理封面下的所有内容。
为了证明,我创建了一个“简单”的示例(实现虚拟数据集并非“简单”)。就是说,如果您可以找出区域参考,则可以找出虚拟数据集。这是指向
注意:虚拟数据集可以在单独的文件中,也可以在与引用数据集的同一文件中。我将在示例中显示这两个。 (一旦定义了布局和来源,这两种方法都同样容易。)
至少还有3个其他关于此主题的问题和答案:
步骤1 :创建一些示例数据。没有您的模式,我猜想您如何存储“标准率”和“爆发率”数据。所有标准费率数据均存储在数据集
'data_log'
中,每个爆发都存储在名为:'sturp_log _ ##'
的单独数据集中。步骤2 :这是定义虚拟布局和源来创建虚拟数据集的地方。这将创建一个虚拟数据集一个新文件,而在现有文件中创建一个。 (声明除了文件名和模式外,语句是相同的。)
How did use region references? I assume you had an array of references, with references alternating between a range of "standard rate" and "burst rate" data. That is a valid approach, and it will work. However, you are correct: it's messy to create, and messy to recover the data.
Virtual Datasets might be a more elegant solution....but tracking and creating the virtual layout definitions could get messy too. :-) However, once you have the virtual data set, you can read it with typical slice notation. HDF5/h5py handles everything under the covers.
To demonstrate, I created a "simple" example (realizing virtual datasets aren't "simple"). That said, if you can figure out region references, you can figure out virtual datasets. Here is a link to the h5py Virtual Dataset Documentation and Example for details. Here is a short summary of the process:
Note: virtual datasets can be in a separate file, or in the same file as the referenced datasets. I will show both in the example. (Once you have defined the layout and sources, both methods are equally easy.)
There are at least 3 other SO questions and answers on this topic:
Example follows:
Step 1: Create some example data. Without your schema, I guessed at how you stored "standard rate" and "burst rate" data. All standard rate data is stored in dataset
'data_log'
and each burst is stored in a separate dataset named:'burst_log_##'
.Step 2: This is where the virtual layout and sources are defined and used to create the virtual dataset. This creates a virtual dataset a new file, and one in the existing file. (The statements are identical except for the file name and mode.)