如何对hdf5时间序列进行查询
我使用 H5TB API 将(非等距)时间序列存储为 hdf5 文件中的表格。格式是这样的:
time channel1 channel2
0.0 x x
1.0 x x
2.0 x x
还有这样的“详细数据”插入:
time channel1 channel2
0.0 x x
1.0 x x
1.2 x x
1.4 x x
1.6 x x
1.8 x x
2.0 x x
现在我想以另一种数据格式存储数据,因此我喜欢像这样“查询” hdf5 文件:
select ch1 where time > 1.6 && time < 3.0
我想到了几种方法此查询:
- 有一个名为 B 树索引的内置功能。是否可以使用它来索引数据?
- 我需要对时间通道进行二分搜索,然后读取
- 我自己创建索引的通道值(并在有详细信息插入时更新它)。这里使用的最佳算法是什么?
索引的主要动机是获得快速的查询响应。
你在这里有什么建议?
i store (non equidistant) time series as tables in hdf5 files using the H5TB API. The format is like this:
time channel1 channel2
0.0 x x
1.0 x x
2.0 x x
There are also insertions of "detail data" like this:
time channel1 channel2
0.0 x x
1.0 x x
1.2 x x
1.4 x x
1.6 x x
1.8 x x
2.0 x x
Now I want to store the data in another data format and therefore I like to "query" the hdf5 file like this:
select ch1 where time > 1.6 && time < 3.0
I thought of several ways to do this query:
- There is a built in feature called B-Tree Index. Is it possible to use this for indexing the data?
- I need to do a binary search on the time channel and then read the channel values
- I create an index myself (and update it whenever there is a detail insertion). What would be the best algorithm to use here?
The main motivation for an index would be to have fast query responses.
What would you suggest here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我最终自己找到了另一个(明显的)解决方案。最简单的方法是打开 hdf5 文件仅读取时间通道并在读取数据通道之前创建内存映射。甚至可以通过使用稀疏超板读取时间通道来优化此过程。
当特定时间的索引已知时,就可以读取数据。
I found another (obvious) solution finally by myself. The easiest way is to open the hdf5 file only read the time channel and create an in memory map before reading the data channels. This process could even be optimized by reading the time channel with a sparse hyperslab.
When the indexes at a particular time are known then the data could be read.
假设您不是询问如何从 hdf5 文件中解析数据,而只是询问如何使用解析后的数据......
给定
class channel_data { ... };
,一个 < a href="http://www.cplusplus.com/reference/stl/map/" rel="nofollow">std::map
;
应该满足您的需求,特别是std::map<> ;::lower_bound()
和std: :map<>::upper_bound()
。Assuming you're not asking about how to parse the data out of a hdf5 file, merely about how to use the data once parsed....
Given
class channel_data { ... };
, astd::map
<double, channel_data>
should suit your needs, specificallystd::map<>::lower_bound()
andstd::map<>::upper_bound()
.解决此问题的一种流行方法似乎是使用位图索引 。也有一些关于这样做的论文,但他们似乎没有发布任何代码。
A popular approach to solving this problem appears to be using bitmap indexing. There are also papers written on doing this, but they do not appear to have published any code.