对于学习数据库设计来说,什么是一个好的开源数据库? (设计 DBMS,而不是规范化表/等)
正如问题中所述,我并不是在创建表、规范化等方面寻求数据库设计方面的帮助。
作为一个编程项目,我希望编写自己的 DBMS。这主要是为了获得学习经验,因此重新发明轮子就是目的。
我通过查看 SQLite 开始搜索 - 我找到了 2001~2004 年的旧 SVN 分支,它的评论令人惊讶,但仍然有很多东西需要一次性消化。但即便如此,我已经经历了大约一两个小时,我的头脑已经充满了想法。
所以我在这里问,希望看看是否有人知道一个小型且非常基本的 DBMS,我可以从中获得一些想法或灵感,例如查询解析、存储数据、构建搜索等。
谢谢!
As stated in the question, I'm not looking for help on database design in the terms of creating tables, normalization, etc.
As a programming project, I'm looking to write my own DBMS. This is for a learning experience more than anything, so reinventing the wheel is kinda the purpose.
I started my search by looking at SQLite - I found my and old SVN branch from 2001~2004, which is amazingly commented, but it's still a lot to digest all at once. But even so, I've been going through it for about an hour or two, and my head is already on hyperdrive with ideas.
So I'm asking here hoping to see if anyone knows of a small and very basic DBMS that I could get some ideas or inspiration from as far as query parsing, storing data, building a search, etc.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我听说 PostgreSQL 源代码有很好的文档记录和结构。
但它显然不符合“小型基础DBMS”的资格。
除此之外,我所知道的唯一“小型”数据库管理系统是基于 Java 的 DBMS:
不确定基于 Java 的实现是否对您有帮助。
I have been told that the PostgreSQL source code is very well documented and structured.
But it obviously does not qualify as a "small basic DBMS".
Apart from that the only "small" ones that I'm aware of are Java based DBMS:
Not sure if a Java based implementation will help you.
Edward Sciore 的 SimpleDB(与 Amazon 的 SimpleDB 无关),“用于教授数据库内部知识的基于 Java 的简单多用户系统”。它是用 Java 编写的,但我认为这些想法很容易转化为 C。
来自 http://www.cs.bc.edu/~sciore/simpledb/intro.html:
还有一本书:
数据库设计与实现
There is Edward Sciore's SimpleDB (not related to Amazon's SimpleDB), "A Simple Java-Based Multiuser System for Teaching Database Internals". It's in Java, but I think that the ideas will translate fairly easy to C.
From http://www.cs.bc.edu/~sciore/simpledb/intro.html:
There is a book too:
Database Design and Implementation
如前所述,SQLite、JavaDB 和 SimpleDB 都是很好的例子。我会将 Berkeley DB 添加到列表中。 Berkleley DB 文档齐全,已经存在多年,除了传统的 B 树之外,还拥有多种可用的 API 以及 HASH、QUEUE 和 RECNO 等多种访问方法。 Berkeley DB 是一个用 C 语言编写的键/值数据库库。 Berkeley DB XML 是一个用 C++ 编写的 XML 数据库库伯克利数据库顶部。 Berkeley DB Java 版 是一个 100% Java 键/值数据库库。所有这些都可以在类似 GPL 的许可下使用,并且源代码包含在发行版中。
Berkeley DB 的 SQL API 合并了 SQLite API,基本上在 SQLite 查询层下实现了 BDB 键/值对数据存储。 Berkeley DB 也是 MySQL 下的第一个数据存储实现,同样采用 SQL 查询层并以简单的键/数据值数据格式存储数据。这当然是一种看待问题的有趣方式——如果您有一个灵活、快速、可扩展、可靠的数据存储,那么您可以在其之上分层任何类型的 API 或数据表示/抽象。这正是 Berkeley DB 所做的,它提供了核心键/值对数据存储或 XML、SQL、Java 集合或基本键/值对基础架构之上的类似 POJO 的持久层之间的选择。
Berkeley DB 几乎是您能找到的最接近“纯粹”的数据存储引擎。它不对所存储数据的结构、内容或格式做出任何假设。它允许上层提供这些抽象,而下层则专注于快速、可扩展、可靠的存储。这就是 Berkeley DB 如此广泛使用的原因之一——它的简单性和专注性使其非常快速、可靠和可扩展。
免责声明:我是 Berkeley DB 的产品经理之一,所以显然我有点偏见。但是,我也从事数据库产品工作 25 年,并且对 DBMS 内部结构了解一些。 :-)
祝你研究顺利。
戴夫
As stated previously, SQLite, JavaDB and SimpleDB are good examples. I would add Berkeley DB to the list. Berkleley DB is well documented, has been around for several years, has several available APIs as well as multiple access methods like HASH, QUEUE and RECNO in addition to the traditional B-tree. Berkeley DB is a key/value database library written in C. Berkeley DB XML is an XML database library written in C++ on top of Berkeley DB. Berkeley DB Java Edition is a 100% Java key/value database library. All of them are available under a GPL-like license and the source code is included in the distribution.
Berkeley DB's SQL API incorporates the SQLite API, basically implementing the BDB key/value pair data store underneath the SQLite query layer. Berkeley DB was also the first data storage implementation underneath MySQL, again taking a SQL query layer and storing the data in a simple key/data value data format. It's certainly an interesting way of looking at the problem -- if you have a flexible, fast, scalable, reliable data store, you can then layer any type of API or data representation/abstraction on top of it. This is exactly what Berkeley DB does, providing a choice between the core key/value pair data storage or XML, SQL, Java Collections or a POJO-like Persistence Layer on top of the base key/value pair infrastructure.
Berkeley DB is about as close to a "pure" data storage engine as you're going to find. It makes no assumptions about structure, content or the format of the data being stored. It allows the upper layers to provide those abstractions while the lower layer focused on fast, scalable, reliable storage. That's one of the reasons why Berkeley DB is so widely used -- it's simplicity and focus makes it very fast, reliable and scalable.
Disclaimer: I'm one of the Product Managers for Berkeley DB, so clearly I'm a little biased. But, I've also been working on database products for 25 years and I know a little about DBMS internals. :-)
Good luck in your research.
Dave
您可以查看 Apache Derby 数据库。这是一个成熟的 RDBMS 实现;
好吧,它完全是用 Java 编写的。
而且,这绝对不是一个小而简单的实现。但可以作为一个很好的参考。
You may have a look at the Apache Derby database. It's a full fledged RDBMS implementation;
Well, it's written entirely in Java though.
and, it definitely is not a small and simple implementation. But it can serve as a good reference.
也许 SQLite 是一个好的开始。它尽可能简单(没有网络层、简单的锁定等),但它理解真正的 SQL,有索引和约束,并且是用 C 实现的。不过,它的存储很奇特。
Maybe SQLite is a good start. It is as simple as possible (no network layer, simplistic locking, etc), but it understands real SQL, has indexes and constraints, and is implemented in C. Its storage is peculiar, though.
如果您想要一个使用 SQL 查询语言的简单关系数据库系统,那么 SQLite 就是它。继续阅读该代码。
但如果您不沉迷于完全关系型数据存储,那么可以在 google 上搜索 B+tree 源代码。 B+树是基本数据结构,允许您在磁盘上维护排序索引,15-20 年前有几个 C 源代码包实现了此结构。它要简单得多,因为没有 SQL,基本上有两部分,一是管理磁盘上的块,二是操作 B+Tree 结构。
一旦理解了这一点,您就可以返回 SQLite 代码,毫无疑问可以在其余代码中识别出类似的模块。
有时,最好的学习方法是追溯一些历史步骤。
If you want a simple relational database system that uses SQL query language then SQLite is it. Keep on reading that code.
But if you are not hung up on fully relational data stores, then google for B+tree source code. The B+tree is the fundamental data structure that allows you to maintain a sorted index on disk, and 15-20 years ago there were several packages of C source code that implemented this. It is much simpler because there is no SQL, and basically two parts, one to manage blocks on the disk and the other to manipulate the B+Tree structure.
Once you understand that, you could go back to the SQLite code and no doubt identify similar modules amidst the rest of the code.
Sometimes the best way to learn is to retrace some historical steps.