自定义平面文件数据库 - 如何设计它们?
通常我只会使用 SQL/SQLite/Mongo 来处理任何数据库,但我认为尝试创建我自己的平面文件数据库结构会很有趣(这只是一个学习项目)。该应用程序本身是一个音乐流媒体,在服务器上有一个集中式库。
数据库:
我的应用程序是客户端/服务器应用程序,服务器所做的任何更改都会同步到所有客户端。服务器执行插入、编辑和编辑操作。删除操作。
客户端只能更改记录中的客户端修饰符布尔字段(其值特定于该客户端)。客户端无法执行其他操作,因此没有需要同步的更改。
在初始数据库构建之后,服务器上的写操作很少见,但确实会发生。这里的首要任务肯定是客户端的读取操作。
需要可扩展至 500k+ 曲目或 2GB(2^31 字节)数据库文件大小(以先到者为准)。
存储什么:
- 几个表,有一些关系。这只是一个模型,但您已经明白了:
+--------+ +--------+ +-------------------+ | id* | | id* | | id* | | ARTIST | ------> | ARTIST | | track name | | | | ALBUM | ------> | ALBUM | | | | year | | length | | | | | | filename** | | | | | | {client modifier} | +--------+ +--------+ +-------------------+ * unique identifier ** not stored in client version of database {client modifier} is only on the client version of database
必须克服的一个问题是如何处理关系和搜索以最小化 I/O 操作。
- 除 id、年份和字段外,所有字段的长度都是可变的。长度。
所需功能:
- 服务器能够以最少的操作将数据库同步到所有客户端。
解决此问题的一种方法是存储每条记录上次修改的日期/时间,并让客户端存储上次同步的日期。当客户端上线时,所有更改日期都会同步回客户端。另一种方法是在服务器上有一个单独的表,其中列出了所有已发生的操作以及它们发生的日期;并以类似的方式同步。
- 客户端的快速读取操作
由于表较小,客户端可以将艺术家、专辑表存储在内存中,但我假设他们不会这样做。
我正在考虑做的是为每个表创建单独的文件,并且客户端始终打开每个文件以确保他们可以尽快读取......这是一个坏主意吗?
必须为每个表存储每条记录开始的某种索引。它可以足够小,可以加载到内存中,并且可以存储在与实际表分开的文件中,以避免出现问题。
- 最小化 I/O 操作
服务器将在内存中存储曲目数据库的“索引”以及 ID 和文件名,以便将读取操作保持在最低限度。
服务器还将缓冲数据库写入操作,以便如果它检测到在短时间内将发生大量写入操作,它将等待,然后进行批量写入。这是可能的,因为在数据库崩溃时对文件系统的更改仍然存在,因此它可以在重新启动时重新加载所有更改。
- 非稀疏文件以将文件大小保持在最小。
我将在字节级别上工作以减少文件大小。主要问题是删除记录时产生碎片。由于可变长度字段,您不能简单地在该位置添加新记录,
当文件达到一定的碎片级别(已删除记录与记录的比率)时,我可以对文件进行碎片整理,但我宁愿避免这种情况如果我可以的话,因为这对客户来说将是一项昂贵的操作。
我也不想使用固定长度字段(例如,文件名可能很大),但在我看来,这些只是选项?
评论:
那么我该如何解决这个问题并最大限度地提高性能呢?
是的,我正在寻找重新发明轮子的方法,是的,我知道我可能无法达到其他数据库的性能。
Normally I would just use SQL/SQLite/Mongo for anything database-y but I thought it would be interesting to try and create my own flat file database structure (it's just a learning project). The application itself is a music streamer with a centralised library on the server.
The database:
My application is a client/server one whereby any changes made by the server sync to all clients. Servers do insert, edit & delete operations.
Clients are only able to change a client modifier boolean field in a record (the value of which is specific to that client). No other operations are available to the clients, so therefore there are NO changes to sync.
Write operations on the server are rare after the initial database construction but do happen. The priority here is definitely read operations for the clients.
Needs to be scalable up to 500k+ tracks or 2GB (2^31 bytes) database file size (which ever comes first).
What is stored:
- Few tables, with some relations. It's only a mockup, but you get the idea:
+--------+ +--------+ +-------------------+ | id* | | id* | | id* | | ARTIST | ------> | ARTIST | | track name | | | | ALBUM | ------> | ALBUM | | | | year | | length | | | | | | filename** | | | | | | {client modifier} | +--------+ +--------+ +-------------------+ * unique identifier ** not stored in client version of database {client modifier} is only on the client version of database
One problem that would have to be overcome is how to deal with the relations and searching to minimise I/O operations.
- All fields are variable length apart from the id, year & length.
Required features:
- Sever able to sync the database to all clients with minimal operations.
One way to approach this would be store the date/time each record was last modified and have the client store the date of the last sync. When a client comes online, all changes part that date sync back to the client. Another way to do this would be to have a separate table on the server which lists all the operations that have happened and the date they happened; and sync in a similar fashion.
- Fast read operations for clients
Due to the tables being smaller, it is possible for a client to store the artists, albums tables in memory but I am going to assume they won't do this.
what I was thinking of doing is having separate files for each table and the client has each file open all the time to ensure they can read as quickly as possible...is this a bad idea?
Some sort of index will have to be stored for each table for where each record starts. This could be small enough to load into memory and could be stores in files separate to the actual tables to avoid issues.
- Minimise I/O operations
The server will store an "index" of the tracks database in memory with the id and file name so read operations are kept to a minimum.
The server will also buffer database write operations so that if it detects that a lot of write operations are going to happen in a short space of time it will wait and then do a batch write. This is possible because the changes to the file system will still be there is the database crashes, so it could just reload all changes on restart.
- NOT sparse file to keep file size to a minimum.
I will be working at a byte level to reduce the file size. The main problem will be fragmentation when a record is deleted. Because of the variable length fields, you can't simply add a new record in that place
I could de-fragment the file when it gets to a certain fragmentation level (ratio of deleted records to records), but I'd rather avoid this if I can as this will be an expensive operations for the clients.
I'd rather not use fixed length fields either (as the filename could be huge for instance), but these seem to me by only options?
Comments:
So how do I go about this and maximise performance?
Yes, I am looking a reinventing the wheel and yes I know I probably won't come anything close to the performance of other databases.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我的建议是设计和构建您的数据库。不用担心性能。首先要担心可靠性。
一次浏览一下您的功能:
这需要数据库更改日志。你的想法是正确的。
在现代 PC 上,您可以足够快地读取平面文件。每个表都有单独的平面文件是一个很好的设计。如果平面文件足够小(域表),您可以读取它们一次并将表保留在内存中。您只需在数据库关闭时写入该表一次。
数据库通过读取和写入数据块来最大限度地减少 I/O 操作。我不会马上太担心这个问题。您的数据库需要可靠。
大多数现代 PC 都有足够的磁盘空间,这是另一个可以推迟到以后的功能。数据库重组通常由 DBA 控制,因为这是一个非常昂贵的过程。
My suggestion would be to design and build your database. Don't worry about performance. Worry about reliability first and foremost.
Going through your features one at a time:
This requires a log of database changes. You have the right idea.
On a modern PC, you can read flat files fast enough. Separate flat files for each table is a good design. If the flat file is small enough (domain tables) you could read them once and keep the table in memory. You'd write the table once, on database shutdown.
Databases minimize I/O operations by reading and writing blocks of data. I wouldn't get too concerned about this right away. Your database needs to be reliable.
Most modern PC's have plenty enough disk space that this is another feature that can be put off until later. Database reorganizations are usually under DBA control, because it's such an expensive process.
这是我几年前未完成的项目。我无法解释更多,它已经过时了,但值得尝试(我实际上没有这样做)。第一印象是它完全是一个杂乱无章的数据库(平面文件),但根据我的理论,这不是最坏的情况。任何人都可以对此添加概念或改进,例如加密、速度增强、数据绑定、数据格式化等。
按照结构,数据被分类到文件夹、文件等中。我还认为每次查询时关闭文件连接是执行会节省内存。
请告诉我,这是几年前提出的,所以我仍然使用ASP。现在,我使用 Ruby-on-Rails 和 NodeJS(以及一些 PHP),
我将这个概念称为文件夹文件数据委托,该数据按文件夹和文件构建为层次结构,数据库中的最小结构称为原子。
的 5 个主要目标
文件夹-文件数据委托结构
一个。通过提供结构化数据库系统草案并避免每次执行操作时重写/搜索庞大的数据库,文件夹-文件数据委托旨在提高平面文件数据库系统的速度声誉。
我。文件夹-文件数据委托使用文件夹和文件存储数据,当您只需按物理路径查询记录时,无需为平面文件数据库系统构建单个文件数据库,查询过程(搜索/更新/删除) /排序等)通过使用比 SQL 更安全的查询操作来进一步简化和扩展记录集合(表)。通过重写包含单个记录信息的小文件而不是重写整个数据库本身,可以提高速度。文件夹-文件数据委派还存储键映射,这进一步简化了新输入数据的验证。另外,您可以为您的应用程序创建无限制的访问门户,当使用一个门户时,将克隆一个新的门户文件,并且将同时而不是按顺序创建操作(警告:这可能会导致 RAM 过载,使用风险自负)并将门户的数量限制为适合您的 CPU 能力的数量)。
二.免责声明:我们并不是隐含地声称文件夹-文件数据委托是有史以来最快的数据库,而是我们宣称使用这种算法可以有效提高速度。
一个。通过提供结构化数据库系统草案和集合、记录和字段等定义组件草案,文件夹-文件数据委派旨在提高平面文件数据库系统的一致性。
我。文件夹-文件数据委托以一种有效分布的方式组织数据,几乎类似于关系数据库的
一个。通过提供结构化数据库系统草案和 NoSQL 以及严格过滤数据库可能受到的攻击,文件夹-文件数据委托旨在提高平面文件数据库系统的安全性。
我。文件夹-文件数据委托提供 NoSQL 数据,查询中的所有数据都被解释为数据而不是命令。查询只是访问物理路径加上传输操作,并且必须使用正在使用的编程语言来完成。这就像一家服务公司到你家为你提供服务和东西,而你的房子就是被操纵的特定记录。
一个。通过提供结构化系统草案以及概念演示文稿草案和计划演示文稿草案,文件夹-文件数据委托旨在提高平面文件数据库系统的安全性。
我。文件夹-文件数据委派提供了 GUI,因此如果您想要更改内容等,则无需手动执行操作和命令。用户界面交互也简化了操作。
一个。通过提供结构化系统草案和抽象规划的数据库结构动态更新,文件夹-文件数据委派旨在提高平面文件数据库系统的灵活性。
我。文件夹-文件数据委托会定期更新和维护,如果您的数据库受到某个可能具有威胁的黑客的攻击等,请不要担心;您的数据库文件夹存储在隐藏文件夹 (.folder) 中。
This is my incomplete project a few years ago. I can't explain it more, it's already obsolete, but it' worth experimenting with (which I didn't do actually). It's totally a disorganized database (flat-file) in first impression but in my theory it's not the worst case scenario. Anyone can add up concepts or improvements to this such as encryption, speed enhancement, data binding, data formating, etc.
By structure, Data are sorted into folders, files, etc. I also think that closing the file connection every time a query is executed will save memory.
Please deal with me, that this was proposed a few years ago, so I still use ASP. Now, I am using Ruby-on-Rails and NodeJS (and some PHP)
I call this concept folder-file data delegation, which data is structured into hierarchies by folder and file and the smallest structures in the database are called atoms.
The 5 Main Objectives of Folder-File Data Delegation
Structure
a. By offering a draft structured database system and the avoidance of rewriting/searching a huge database every time an operation is carried, Folder-File Data Delegation aims to improve the speed reputation of flat file database systems.
i. Folder-File Data Delegation stores data using folders and files, there is no need of structuring a single file database for flat-file database systems when you can just query a record by its physical path, the process of querying (searching/updating/deleting/sorting, etc.) a collection of records (table) is further simplified and extended by the use of query operatives which is securely better than SQL. The speed is improved by rewriting a small file containing the individual record’s information, instead of rewriting the entire database itself. Folder-File Data Delegation also stores key maps which further simplifies the validation of new input data. Plus, you can create unlimited access portals for your application, when one portal is in use, a new portal file will be cloned and operations will be created simultaneously not sequentially, (warning : this can result to RAM overload, use at your own risk and limit the number of portals to a number suited to your CPU’s power).
ii. Disclaimer: We do not implicitly claim that Folder-File Data Delegation is the fastest database ever, rather we pronounce that the speed is effectively improved using such algorithm.
a. By offering a draft structured database system and draft defined components such as collections, records and fields, Folder-File Data Delegation aims to improve the coherence of flat file database systems.
i. Folder-File Data Delegation organizes data in such a way is effective distributed almost similarly to a relational database’s
a. By offering a draft structured database system and a NoSQL and strict filtration of possible attacks to the database, Folder-File Data Delegation aims to improve the security of flat file database systems.
i. Folder-File Data Delegation offers a NoSQL data and all data in query are interpreted as data and not as command. Querying is just accessing a physical path plus transmitting operations, then, and must be done in the programming language being used. It’s like a service company going to your house giving you services and stuffs, where your house is the certain record being manipulated.
a. By offering a draft structured system and draft documented and draft planned presentations of concepts, Folder-File Data Delegation aims to improve the security of flat file database systems.
i. Folder-File Data Delegation offers a GUI, so you don’t need to manually carry out operations and commands if you want to change stuffs, etc. Operations are also simplified with user-interface interactions.
a. By offering a draft structured system and abstractly planned dynamic updates to the database structure, Folder-File Data Delegation aims to improve the flexibility of flat file database systems.
i. Folder-File Data Delegation is regularly updated and maintained, if ever your database is attacked by a certain hacker which can be threatening, etc. Don’t worry; your database folder is stored in a hidden folder (.folder).