存储数据的最佳(免费)方式? 文件系统的更新怎么样?

发布于 2024-07-05 12:27:12 字数 986 浏览 11 评论 0原文

我有一个关于如何解决这个问题的想法,但我想知道是否有更简单、更可扩展的方法来解决我的问题。

我正在开发的程序有两种基本形式的数据:图像以及与这些图像相关的信息。 与图像相关的信息之前已存储在极其简单的 JET 数据库(四个表)中,结果证明存储字段既缓慢又不完整。 我们正在转向新的数据存储实现。 考虑到所涉及的数据结构的简单性,我认为数据库是多余的。

每个图像都有自己的信息(捕获参数),将是一组相互关联的图像的一部分(例如在同一个三十分钟内拍摄的图像),然后是一个更大的图像组的一部分(拍摄同一个人) )。 现在,我将人员存储在具有唯一标识符的字典中。 然后每个人都有一个不同组图片的列表,每个图片组都有一个图片列表。 所有这些类都是可序列化的,我只是序列化和反序列化字典。 相当简单的事情。 图像是单独存储的,因此字典的大小不会变得天文数字。

问题是:当我需要添加新的信息字段时会发生什么? 是否有一种简单的方法来设置这些数据结构以考虑未来潜在的修订? 过去,我在 C 中处理此问题的方法是创建一个包含大量空字节(至少是 ak)的可序列化结构,以实现未来的可扩展性,结构中的一个字节表示版本。 然后,当程序读取该结构时,它会根据大量 switch 语句知道要使用哪种反序列化(旧版本可以读取新数据,因为无关的数据只会进入被忽略的字段)。

C#中是否存在这样的方案? 例如,如果我有一个由一组 String 和 Int 对象组成的类,然后我将另一个 String 对象添加到该结构中,如何从磁盘反序列化一个对象,然后将字符串添加到其中? 我是否需要接受拥有多个版本的数据类以及一个采用反序列化流并根据基类中存储的某些版本信息处理反序列化的工厂? 或者像 Dictionary 这样的类是存储此类信息的理想选择,因为它会自动反序列化磁盘上的所有字段,并且如果添加了新字段,我可以捕获异常并用空白字符串和整数替换这些值?

如果我采用字典方法,是否会影响文件读/写以及参数检索时间? 我认为,如果类中只有字段,那么字段检索是即时的,但在字典中,会产生一些与该类相关的小开销。

谢谢!

I have an idea for how to solve this problem, but I wanted to know if there's something easier and more extensible to my problem.

The program I'm working on has two basic forms of data: images, and the information associated with those images. The information associated with the images has been previously stored in a JET database of extreme simplicity (four tables) which turned out to be both slow and incomplete in the stored fields. We're moving to a new implementation of data storage. Given the simplicity of the data structures involved, I was thinking that a database was overkill.

Each image will have information of it's own (capture parameters), will be part of a group of images which are interrelated (taken in the same thirty minute period, say), and then part of a larger group altogether (taken of the same person). Right now, I'm storing people in a dictionary with a unique identifier. Each person then has a List of the different groups of pictures, and each picture group has a List of pictures. All of these classes are serializable, and I'm just serializing and deserializing the dictionary. Fairly straightforward stuff. Images are stored separately, so that the dictionary doesn't become astronomical in size.

The problem is: what happens when I need to add new information fields? Is there an easy way to setup these data structures to account for potential future revisions? In the past, the way I'd handle this in C was to create a serializable struct with lots of empty bytes (at least a k) for future extensibility, with one of the bytes in the struct indicating the version. Then, when the program read the struct, it would know which deserialization to use based on a massive switch statement (and old versions could read new data, because extraneous data would just go into fields which are ignored).

Does such a scheme exist in C#? Like, if I have a class that's a group of String and Int objects, and then I add another String object to the struct, how can I deserialize an object from disk, and then add the string to it? Do I need to resign myself to having multiple versions of the data classes, and a factory which takes a deserialization stream and handles deserialization based on some version information stored in a base class? Or is a class like Dictionary ideal for storing this kind of information, as it will deserialize all the fields on disk automatically, and if there are new fields added in, I can just catch exceptions and substitute in blank Strings and Ints for those values?

If I go with the dictionary approach, is there a speed hit associated with file read/writes as well as parameter retrieval times? I figure that if there's just fields in a class, then field retrieval is instant, but in a dictionary, there's some small overhead associated with that class.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

感情旳空白 2024-07-12 12:27:12

只是一点警告,SQLLite、Protocol Buffers、mmap 等...都非常好,但您应该对每个实现进行原型设计和测试,并确保不会遇到相同的性能问题或不同的瓶颈。

简单性可能只是升级到 SQL (Express)(您可能会对性能提升感到惊讶)并修复当前数据库设计中缺少的任何内容。 然后,如果性能仍然是一个问题,请开始研究这些其他技术。

Just a wee word of warning, SQLLite, Protocol Buffers, mmap et al...all very good but you should prototype and test each implementation and make sure that your not going to hit the same perf issues or different bottlenecks.

Simplicity may be just to upsize to SQL (Express) (you'll may be surprised at the perf gain) and fix whatever's missing from the present database design. Then if perf is still an issue start investigating these other technologies.

智商已欠费 2024-07-12 12:27:12

我的大脑现在很混乱,所以我不确定我可以建议或反对数据库,但如果你正在寻找与版本无关的序列化,你就是一个傻瓜,至少不检查 协议缓冲区

以下是我所了解的 C#/.NET 实现的快速列表:

My brain is fried at the moment, so I'm not sure I can advise for or against a database, but if you're looking for version-agnostic serialization, you'd be a fool to not at least check into Protocol Buffers.

Here's a quick list of implementations I know about for C#/.NET:

想念有你 2024-07-12 12:27:12

有一个数据库模式可以处理这种情况,我不记得它的名字了。 你基本上有两张桌子。 一张表存储变量名称,另一张表存储变量值。 如果要对变量进行分组,请添加第三个表,该表与变量名称表具有一对多关系。 此设置的优点是可以让您不断添加不同的变量,而无需不断更改数据库架构。 在与经常改变主意的部门(如营销部门)打交道时,我省了好几次麻烦。

唯一的缺点是变量值表需要将实际值存储为字符串列(实际上是 varchar 或 nvarchar)。 然后,您必须处理将值转换回其本机表示形式的麻烦。 我目前维护着这样的东西。 变量表目前约有 8 亿行。 它仍然相当快,因为​​我仍然可以在一秒内检索值的某些变化。

There's a database schema, for which I can't remember the name, that can handle this sort of situation. You basically have two tables. One table stores the variable name, and the other stores the variable value. If you want to group the variables, then add a third table that will have a one to many relationship with the variable name table. This setup has the advantage of letting you keep adding different variables without having to keep changing your database schema. Saved my bacon quite a few times when dealing with departments that change their mind frequently (like Marketing).

The only drawback is that the variable value table will need to store the actual value as a string column (varchar or nvarchar actually). Then you have to deal with the hassle of converting the values back to their native representations. I currently maintain something like this. The variable table currently has around 800 million rows. It's still fairly fast, as I can still retrieve certain variations of values in under one second.

虫児飞 2024-07-12 12:27:12

我不是 C# 程序员,但我喜欢 mmap() 调用,并且看到有一个项目为 C# 做这样的事情。

请参阅 Mmap

如果针对特定应用程序进行定制,结构化文件的性能非常好,但难以管理,而且代码资源很难重用。 更好的解决方案是类似虚拟内存的实现。

  • 最多可管理 4 GB 的信息。
  • 空间可以根据实际数据大小进行优化。
  • 所有数据都可以视为单个数组并通过读/写操作进行访问。
  • 无需构建存储,只需使用和存储即可。
  • 可以缓存。
    可重复使用性很高。

I'm no C# programmer but I like the mmap() call and saw there is a project doing such a thing for C#.

See Mmap

Structured files are very performing if tailored for a specific application but are difficult to manage and an hardly reusable code resource. A better solution is a virtual memory-like implementation.

  • Up to 4 gigabyte of information can be managed.
  • Space can be optimized to real data size.
  • All the data can be viewed as a single array and accessed with read/write operations.
  • No needing to structure to store but just use and store.
  • Can be cached.
    Is highly reusable.
我爱人 2024-07-12 12:27:12

因此选择 sqllite 的原因如下:
1.您不需要每次都从磁盘读取/写入整个数据库
2. 即使您在开始时没有留下足够的占位符,添加起来也会更容易
3.更容易根据您想要的任何内容进行搜索
4.更容易以超出应用程序设计的方式更改数据

字典方法的问题
1. 除非你做了一个智能字典,否则每次都需要读/写整个数据库(除非你仔细设计数据结构,否则很难保持向后兼容性)
----- a) 如果你没有留下足够的占位符,再见
2. 似乎您必须对所有照片进行线性搜索才能搜索其中一项拍摄属性
3. 一张图片可以属于多个组吗? 一张照片可以在多人下面吗? 两个人可以在同一组吗? 对于字典,这些事情可能会变得很棘手......

对于数据库表,如果您获得一个新属性,您可以只说“更改表图片添加属性数据类型”。 然后,只要您不制定规则规定该属性必须具有值,您仍然可以加载和保存旧版本。 同时较新的版本可以使用新的属性。

另外,您不需要将图片保存在数据库中。 您只需将图片的路径存储在数据库中即可。 然后当应用程序需要图片时,只需从磁盘文件加载即可。 这可以使数据库大小保持更小。 此外,与加载图像的时间相比,获取磁盘文件的额外寻道时间很可能是微不足道的。

也许你的桌子应该是
图片(PictureID,GroupID?,文件路径,捕获参数1,捕获参数2等..)

如果您想要更大的灵活性,您可以制作一个表格
CaptureParameter(PictureID, ParameterName, ParameterValue) ...我建议不要这样做,因为它比将它们放在一个表中效率低得多(更不用说检索/搜索捕获参数的查询会更复杂)。

人(PersonID,任何人属性,如姓名等)
组(组ID、组名称、人员ID?)
人员组?(人员 ID、组 ID)
图片组?(组ID,图片ID)

So go with sqllite for the following reasons:
1. You don't need to read/write the entire database from disk every time
2. Much easier to add to even if you don't leave enough placeholders at the beginning
3. Easier to search based on anything you want
4. easier to change data in ways beyond the application was designed

Problems with Dictionary approach
1. Unless you made a smart dictionary you need to read/write the entire database every time (unless you carefully design the data structure it will be very hard to maintain backwards compatibility)
----- a) if you did not leave enough place holders bye bye
2. It appears as if you'd have to linear search through all the photos in order to search on one of the Capture Attributes
3. Can a picture be in more than one group? Can a picture be under more than one person? Can two people be in the same group? With dictionaries these things can get hairy....

With a database table, if you get a new attribute you can just say Alter Table Picture Add Attribute DataType. Then as long as you don't make a rule saying the attribute has to have a value, you can still load and save older versions. At the same time the newer versions can use the new attributes.

Also you don't need to save the picture in the database. You could just store the path to the picture in the database. Then when the app needs the picture, just load it from a disk file. This keeps the database size smaller. Also the extra seek time to get the disk file will most likely be insignificant compared to the time to load the image.

Probably your table should be
Picture(PictureID, GroupID?, File Path, Capture Parameter 1, Capture Parameter 2, etc..)

If you want more flexibility you could make a table
CaptureParameter(PictureID, ParameterName, ParameterValue) ... I would advise against this because it is a lot less efficient than just putting them in one table (not to mention the queries to retrieve/search the Capture Parameters would be more complicated).

Person(PersonID, Any Person Attributes like Name/Etc.)
Group(GroupID, Group Name, PersonID?)
PersonGroup?(PersonID, GroupID)
PictureGroup?(GroupID, PictureID)

巴黎盛开的樱花 2024-07-12 12:27:12

Sqlite 就是你想要的。 它是一个快速、可嵌入的单文件数据库,可以绑定到大多数语言。

关于可扩展性,您可以使用默认属性存储模型,然后为属性扩展创建一个单独的表以供将来更改。

一两年后,如果代码仍在使用,您会很高兴:1)其他开发人员不必学习定制的代码结构来维护代码,2)您可以导出、查看、修改使用标准数据库工具(有一个用于 sqlite 文件的 ODBC 驱动程序和各种查询工具)来处理数据,3)您将能够以最少的代码更改扩展到数据库。

Sqlite is what you want. It's a fast, embeddable, single-file database that has bindings to most languages.

With regards to extensibility, you can store your models with default attributes, and then have a separate table for attribute extensions for future changes.

A year or two down the road, if the code is still in use, you'll be happy that 1)Other developers won't have to learn a customized code structure to maintain the code, 2) You can export, view, modify the data with standard database tools (there's an ODBC driver for sqlite files and various query tools), and 3) you'll be able to scale up to a database with minimal code changes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文