多么“完美”啊数据库必须是吗?
我正在设计一个网站来跟踪举重和卡路里。我以前从未设计过自己的数据库,我认为在开始编码之前我应该尽可能地了解情况。每当我认为我已经完成时,我就会因为它不完美而焦躁不安。我已经在 MySQL Workbench 中充实了它,但我仍然有一些问题。
- 使之变得多么重要 数据库事先“完美”吗?如何 重构数据库很难 之后?
- 引擎选项有什么作用?正确的 现在所有表都默认为 InnoDB引擎?这有什么问题吗?
- 数据之后数据类型是否可以更改 入口?也许不会从 VARCHAR 开始 到 INT,但是 VARCHAR(45) 到 VARCHAR(255)?
- 必须的基本数据库规则是什么 成功的设计应该遵循什么?
我还附上了我的 ERM 图。如果有人能指出任何明显的违规行为,我们将不胜感激。
I'm designing a website to track weight lifting and calories. I've never designed my own database before and I figured I should get is as spot on as possible before I start coding. Every time I think I'm done, I get antsy that it's not perfect. I've mostly fleshed it out in MySQL Workbench but I still have a few questions.
- How important is it to make the
database "perfect" before hand? How
hard is it to re-factor a database
later? - What does the engine option do? Right
now all tables are defaulting to
InnoDB engine? Any issues with this? - Are dataypes changeable after data
entry? Maybe not going from VARCHAR
to INT, but what about VARCHAR(45) to
VARCHAR(255)? - What are basic database rules that HAVE to
be followed for a successful design?
I've also attached my ERM diagram. If anyone could point out any obvious irregularities it would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
数据库的变化是可以预料的,问题是如何管理它。至少按照 Boyce-Codd / 第五范式进行设计有助于避免设计中的内置偏差,并使模式在必要时更容易发展。
在需要之前要小心不要在设计中添加太多内容。如果您设计的模式太早于应用程序和使用它的其他数据使用者,那么您很可能需要在以后进行更多更改。应用 YAGNI 原则并应用
在开发过程中,我发现首先对模式进行尽可能的限制,然后在以后需要时放松这些限制是有意义的。例如,如果您不确定某些业务规则是否应该始终适用,那么最好还是强制执行它。放松约束或稍后将某些内容设为可选是很容易的,但是一旦构建了相关代码,就很难实现代码未设计用于适应的新规则。
小心“重构”这个词。数据库模式更改通常会改变数据库的含义和行为。从应用程序的角度来看,这可能是一次重构,但就数据库而言,大多数模式更改都是功能性的而不是非功能性的更改。
Database change is to be expected, the question is how to manage it. Designing to at least Boyce-Codd / 5th Normal Form helps avoid in-built bias in the design and makes the schema easier to evolve when you have to.
Be careful about adding too much to the design before it is needed. If you design the schema too far ahead of the applications and other data consumers who use it then you will very possibly have to make more changes later. Apply the YAGNI principle and apply Agile, iterative approaches to your project if you can.
During development I find that it makes sense to start by being as restrictive as possible with the schema and then ease those restrictions if you need to later. For instance if you aren't sure whether some business rule should always apply or not then it's better to enforce it anyway. It's easy to relax a constraint or make something optional later but once dependent code has been built it is much harder to implement new rules that the code was not designed to accommodate.
Be careful with the word "refactoring". Database schema changes generally alter the meaning and behaviour of the database. It may be a refactoring from the application point of view but as far as the database is concerned most schema change is functional rather than nonfunctional change.
完美是优秀的敌人。平庸和坏人也是好人的敌人。
正如其他人所说,学习如何标准化数据。至少达到 BCNF。稍后,您可以了解何时忽略规范化规则。
更一般地说,学习数据的关系模型。这方面有优秀的文本。初学者请参阅 CJ Date。不要等到您学习了整个模型才构建您的第一个数据库。
数据的关系模型通常会导致结果与相同数据的对象模型不匹配。了解如何应对这种不匹配。
做好应对变化的准备。
The perfect is the enemy of the good. The mediocre and the bad are also enemies of the good.
As others have said, learn how to normalize data. At least up to BCNF. Later, you can learn when to disregard the rules of normalization.
More generally, learn the relational model of data. There are outstanding texts in this regard. See CJ Date for starters. Don't wait till you have learned the entire model to build your first database.
The relational model of data often leads to results that appear to mismatch with an object model of the same data. Learn how to cope with this mismatch.
Be prepared to cope with change.
你的设计看起来很不错!
现在主要的事情是扔出你的用例和用户对话框,看看是否所有的流程都很容易支持。
完美的用户体验比完美的数据库更重要。
Your design looks pretty good!
The main thing now is to walk throw your use cases and user dialogs and see if all the flows are easily supported.
A perfect user experience is more important than a perfect DB.
从尽可能完美开始,然后根据需要使其尽可能不完美。
Start as perfect as you can make it, then make it as imperfect as you need it.
没有任何实用的、可操作的数据库是完美的。他们都有自己的缺点。
使用数据库的一大优势是它可以承受未来的变化。例如,使用存储过程进行数据更新和添加数据。如果表最终被拆分,则可以相应地修改存储过程,而无需任何外部软件产生可见的影响。
如果你等到设计完美为止,你将永远无法抽出时间来实现任何有用的东西。
No practical, operating database is perfect. They all have their warts.
One of the big advantages of using a database is that it can be built to withstand future change. For example, use stored procedures for data updates and adding data. If a table is eventually split, the stored procedure can be modified accordingly with no visible impact required for any external software.
If you wait until the design is perfect, you'll never get around to implementing anything useful.
一般来说,尝试让您的实体得到解决,但事后添加字段、索引等也没什么大不了的。重构实体以支持不同的基数在某种程度上涉及更多。
乍一看,有几件事突然出现......
1. 练习表中所有编号的字段。巨大的红旗。我会立即重构它。
2.徽章->用户徽章。即使您对名称强制执行唯一约束,我仍然会为该表(一个 id)创建一个新的 PK。
3. 我还需要一些时间来标准化您的 id 字段的命名(即在 User 中,您使用“userid”,但在 food 中,您使用“id”而不是“foodid”)。你走哪条路是一个选择的问题,但一定要尽量保持一致。
In general, try to get your entities worked out, but adding fields, indexes, etc. after the fact is no big deal. Refactoring entities to support different cardinalities is somewhat more involved.
At a glance, a couple things pop out...
1. All those numbered fields in your exercise table. HUGE red flag. I would refactor this right away.
2. badges->userbadges. Even if you're enforcing a unique constraint on name, I would still make a new PK for that table (an id).
3. I would also take some time to normalize the naming of your id fields (i.e. in User, you use 'userid' but in food, you use 'id' instead of 'foodid'). Which way you go is a matter of choice but do try to be consistent.
“完美的 SQL 数据库”的问题在于,SQL 缺乏使“完美的数据库”超越某个平凡水平的能力。
但是,听听老师的意见,让 Codd 感到自豪(3NF/BCNF 等等)。稍后添加缓存或临时非标准化数据(为了性能,在需要时)等比尝试修复充满不良数据的数据库要容易一些。数据需要爱。爱它。
话虽这么说,但我要指出一些事情:
exercise
看起来错误,错误,错误 - secondary1, secondary2, secondaryN ...不,谢谢!我不会像这样留下架构!userid
是 PK (varchar),但usernumber
是 int?再说一遍,不,谢谢。这似乎并不遵循与其他表相同的格式(大多数是id
),并且应该是一个提示,表明某些内容可能不正确。也许userid
(或username
?)应该只有一个唯一的约束(您也可以向其他列添加索引——PK通常在单调递增的值上“效果最好”,尽管数据库也应该有重新索引计划)。命名不一致(
userid
vsusernumber
vsid
)。不,谢谢。我不在乎你选择什么 - 但请始终如一地这样做。只是我的快速评论。
快乐发展。
The problem with a "perfect SQL database" is that SQL lacks the power to make a "perfect database" beyond a certain trivial level.
But, listen to your teachers and make Codd proud (3NF/BCNF and all that). It's a good bit easier to add caching or temporal un-normalized data later (for performance, when it's needed), etc, than it is to try and fix a database full of bad data. Data wants love. Love it.
That being said here are some thing I'll point out:
exercise
looks wrong, wrong, wrong -- secondary1, secondary2, secondaryN ... no thanks! I would not leave the schema like this!userid
is a PK (varchar) butusernumber
is an int? Again, no thanks. This doesn't appear to follow the same format as the other tables (most areid
) and should be a tip-off that something may not be right. Perhaps theuserid
(orusername
?) should just have a unique constraint (you can add indexes to other columns too -- PKs generally "work best" with monotonically increasing values, although DBs should have a re-index schedule as well).Inconsistent naming (
userid
vsusernumber
vsid
). No thanks. I don't care what you choose -- but please do it consistently.Just my quick review.
Happy developing.
它看起来相当不错......除了你的
练习
表。secondary1 到
secondary10 字段可能应该位于不同的表中。然而,您在其他地方都做得很好,所以也许这有意义吗?想一想并确定一下。
完美并不重要,事情可以(并且将会)在你完成后发生变化。尝试尽可能地正常化(你基本上已经做到了这一点)。字段大小可以更改,不用担心。
It looks pretty good ... except your
exercise
table. Thesecondary1
tosecondary10
fields smack of something that should probably be in a different table. However, you've done well everywhere else, so maybe this makes sense? Think about it and make sure.Perfectness doesn't matter, things can (and will) change after you're done. Try to normalize down as far as you can (you've done this mostly already). Field sizes can be changed, don't worry about that.
已经给出了很多好的建议。这里还有两点需要补充:
1 - 永远不要以纯文本形式存储密码。加盐,使用加密安全哈希对其进行哈希处理,然后存储盐/哈希值。
2 - 以 UTC 格式存储日期/时间,并在表示层转换为所需的时区。 (我总是将日期时间存储为unix时间戳,但这只是我的偏好。mysql DATETIME没有任何问题。)
我建议您在完全提交之前测试您的设计,并围绕它编写整个应用程序。将您的设计推向真正的 mysql 数据库,并编写一些查询以确保您可以添加和检索您想要的所有信息。一旦您的查询正常运行,请编写一个脚本来加载大量(千分之几到数百万条记录)的数据库并再次运行您的查询。这是测试您的设计的最佳方法。当您只有几百条记录时,几乎任何查询都会很快。一旦您的数据库达到临界质量并且不再适合内存,您将真正能够判断您的设计和索引是否针对您的预期用途进行了优化。
A lot of good advice has already been given. Here's two more points to add:
1 - Never ever store a password in plain-text. Salt it, hash it with a cryptographically secure hash, and store the salt/hash.
2 - Store date/time in UTC and convert to desired timezone at the presentation layer. (I always store datetime as a unix timestamp, but that's just my preference. Nothing wrong with the mysql DATETIME.)
I would recommend that you test your design before you fully commit to it, and write an entire application around it. Push your design up to a real mysql database and and write a few quires to make sure you can add and retrieve all the info that you want. Once you have your quires functioning write a script to load the database with lots(hundreds of thousandths to millions of records) and run your quires again. This is the best way to test out your design. When you only have few hundred records, pretty much any query will be quick. Once your database hits critical mass and no longer fits in ram you will you really be able to tell if your design and indexes are optimized for your intended use.