分层.Net 文件格式?

发布于 2024-07-29 05:38:33 字数 750 浏览 12 评论 0原文

我们公司一段时间以来一直在寻找一种文件格式来保存大量实验室传感器数据。 每次运行仪器时,它都会生成一个文件,我们使用该文件并将其存储在数据库中以进行趋势分析等。分层格式是首选,因为它允许我们对数据进行“分组”。 这是我们将数据放入数据库之前的中间文件格式。 由于我们的开发环境,这是我们的优先事项列表:

1) .Net 兼容。 该 API 将用于 Web 服务和客户端应用程序。 我们对客户的环境没有任何控制权,因此纯.Net 解决方案是最好的。

2)读取速度。 我们的读取是随机的,而不是顺序的。 越快越好。 如果我们不是一家 C# 开发公司,我会说速度是第一。

3)文件大小。 如果文件本身很大,则需要良好的压缩比(86% 或更高)。

4) 读取的内存占用。 由于数据量巨大,我们无法简单地读取它。 每个传感器都有一个时间/值对。 这可以产生超过 400 万对。 这为我们消除了 XML。

我们目前查看了HDF5,发现.NET 领域严重缺乏该 API,无法提供 Web 服务,但具有我们正在寻找的大小/速度。 我还研究了 JSON ,它看起来很有希望,但我还没有尝试读回一段数据。 我在网上搜索过,没有找到很多可以满足我们需要的文件格式。 任何帮助表示赞赏。

Our company has been for a while looking at a file format to hold a large amount of lab sensor data. Each time they run the instrumentation, it generates a file, which we consume and store in a database for trending, etc. A hierarchical format is preferred as it allows us to "group" data. This is a intermediate file format before we place the data into a database. Due to our development environment, this is our priority list:

1) .Net compliant. The API will be used in web services and a client application. We do not have any control over the customer's environment, so a pure.Net solution is best.

2) Speed of reads. Our reads are random, not sequential. The faster the better. If we were not a C# development shop I would say speed is #1.

3) File Size. If the file itself is large, a good compression ratio (86% and higher) is desired.

4) Memory footprint of the reads. Due to the volume of data, we cannot simply read it. each sensor has a time/value pair. This can generate will over 4 million pairs. This has eliminated XML for us.

We have currently looked at HDF5 and found the API is horribly lacking in the .NET arena, cannot do web services, but has size/speed we are looking for. I have looked also into JSON and it looked promising but I haven't tried reading a piece of the data back. I have searched the web and not found a lot of file formats that do what we need. Any help is appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

香橙ぽ 2024-08-05 05:38:33

你需要一个b树数据库,例如:
Sql Server Compact

另请参阅 SQLite
http://sqlite.phxsoftware.com/

CTree 更像是一个 ISAM,如果你可以省去SQL部分
谷歌搜索 ctree

抱歉,我会链接更多,所以不允许我,因为这是一个新帐户

You need a b-tree database, such as:
Sql Server Compact

Also look at SQLite
http://sqlite.phxsoftware.com/

CTree is more of an ISAM, if you can dispense with the SQL part
google for ctree

Sorry, I'd link more, SO isn't letting me bc this is a new acct

心意如水 2024-08-05 05:38:33

我认为你最好将此信息存储在数据库的表中,如果你使用 SQL Server,则 VARBINARY 应该可以完成这项工作。

您的表可以通过包含一个 [Parent] 字段来分层,该字段对于顶级节点可以为空。

如果您对查找值(文件 ID)建立索引,则随机访问应该很快。 如果您需要压缩,您可以尝试使用 GZip 类来格式化原始 byte[],然后再将其粘贴到数据库中。

使用数据库来存储此信息使您能够:

1) 运行疯狂的查询、连接等。
2)您可以对多个列建立索引,以便通过不同的键值更快地查找
3).Net肯定有多个API
4)如果不会太影响速度,可以添加压缩
5) 备份数据应该是小菜一碟

这个建议对您有帮助吗?

I think you might be better off storing this information in a table in your database, if you are using SQL Server, a VARBINARY should do the job.

Your table can be hierarchal by including a [Parent] field that can be null for top level nodes.

If you index your lookup value (id of file), random access should be quick. If you are needing compression, you can try using the GZip classes to format your raw byte[] before sticking it in the database.

Using a database for this information gives you the ability to:

1) Run crazy queries, joins, etc.
2) You can index multiple columns for faster lookup of by different key values
3) .Net for sure has multiple APIs
4) Compression can be added if it doesn't affect speed too badly
5) Backing up the data should be a cinch

Does this advice help you out?

能否归途做我良人 2024-08-05 05:38:33

我认为特殊的阅读要求对于任何格式来说都是一个问题,在这种情况下,您需要实现自己的解析器。

I think the special reading requirement would be a problem for any format, and in this case you'll need to implement your own parser.

停滞 2024-08-05 05:38:33

如果二叉树/平衡树格式不太费力,您可以考虑将其存储为 Newick 格式< /a>. 它还可以支持像 JSON 这样的键/值对格式。

然而,它实际上并不比 JSON 更轻 - “{}”被替换为“()”。

((浣熊,
熊),((海狮,海豹),((猴子,猫),
黄鼠狼)),狗);

显然,作为二叉树,它的查询速度非常快,尽管可能不会比 JSON 对象的字典快,但它没有链表样式层次结构(对象图)需要担心。

恐怕我找不到任何 .NET api,只有 Java 和 C。

If Binary Tree/Balanced Tree format isn't too much effort, you could look into storing it in Newick Format. It can also support key/value pair format like JSON.

It's not really any more light weight than JSON however - "{}" are replaced with "()".

((raccoon,
bear),((sea_lion,seal),((monkey,cat),
weasel)),dog);

Obviously being a binary tree it's very fast to query, though again probably no faster than a dictionary from a JSON object, however it has no linked list style hierachy (object graph) to worry about.

I'm afraid I couldn't find any .NET apis for it though, just Java and C.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文