MySQL 服务器设计和可扩展性

发布于 2024-12-31 23:56:05 字数 441 浏览 1 评论 0原文

我对 SQL 使用的数据库有一个简单的设计:

Table1 - recipeTable key, value1..N, recipeId
Table2 - recipeInstructions key, instId, text, reciepIdFK (each instruction has entry)
Table3 - recpeIngredients key, ingId, text, recipeIdFK (each ingredient has an entry)

问题是 - 在数据库上进行查询等(连接来自所有 3 个表的信息)时,表大小何时会成为问题(例如,搜索将花费太多时间)。

我应该做一些聚类 - 比如 ids 1..10000 等的表 - 或者 SQL 以它自己的方式做?

表的大小可以达到100,000甚至更多。

约阿夫

I have a simple design for DB used by SQL:

Table1 - recipeTable key, value1..N, recipeId
Table2 - recipeInstructions key, instId, text, reciepIdFK (each instruction has entry)
Table3 - recpeIngredients key, ingId, text, recipeIdFK (each ingredient has an entry)

The question is - for doing queries etc on the DB (with joining info from all 3 tables), when will the tables size be a problem (e.g. search will take too much time).

Shoudl I do some clusturing - like table for ids 1..10000 etc.. - or the SQL does it in its own way?

The table size can get up to 100,000 and even more.

Yoav

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蓝天白云 2025-01-07 23:56:05

首先,我认为您可能缺少“成分”表。

Recipe
-------
RecipeID PK
Title
...

RecipeInstruction
------
RecipeInstructionID PK
RecipeID FK
sequence
description

Ingredient
---------
IngredientID
Name
....

RecipeIngredient
------------
RecipeID FK
IngredientID FK
quantity

其次,解决可扩展性问题的最佳方法是尝试它们。设计您的数据库,填充测试数据(我过去使用过 DBMonster),然后处理您需要支持的查询。获取的数据量可能是您认为在现实生活中产生的数据量的两倍。您使用 DAL 的事实并不那么重要 - DAL 只是为您生成查询,如果您掌握了正确的基础知识,您可以在将来进行调整。

对于您的应用程序,我猜您会想要运行如下查询:

  • 获取所有包含成分的食谱 ​​x
  • 获取类别中的所有食谱 x
  • 获取所有没有成分的食谱 ​​x
  • 获取所有需要不到 x 分钟的食谱
  • 获取所有包含成分的食谱( x, y, z)
  • 获得所有难度为 x 的食谱

以及上述的组合。看看你的查询是否有效;如果他们这样做了,就别管了。说真的,不要为了以防万一而“优化”。如果查询不起作用,请调整它们。了解索引和自由文本搜索。

如果这不起作用,请考虑购买更大的硬件。从长远来看,它比奇特的设计解决方案更便宜(例如“一张表最多可容纳 10000 条记录” - 想象一下其数据访问逻辑......)。

一般来说,如果您的查询可以使用索引,它们将在现代经济实惠的硬件上完美地连接数百万个表或记录。可以安全地假设所有“ID”列都将被索引,并且通过索引搜索将会很快。

如果您必须支持通配符(例如“其中成分如'%banana%'),那么搜索文本值可能会很慢。这就是我要花精力的地方 - MySQL 支持自由文本搜索,这非常快。

Firstly, I think you may be missing a table for "ingredients".

Recipe
-------
RecipeID PK
Title
...

RecipeInstruction
------
RecipeInstructionID PK
RecipeID FK
sequence
description

Ingredient
---------
IngredientID
Name
....

RecipeIngredient
------------
RecipeID FK
IngredientID FK
quantity

Secondly, the best way to work on scalability questions is to try them out. Design your database, populate with test data (I've used DBMonster in the past), and then work on the queries you need to support. Go for perhaps twice as much data as you think you'll ever generate in real life. The fact you're using a DAL doesn't matter all that much - the DAL just generates the queries for you, and if you get the basics right, you can tune that in the future.

For your app, I'm guessing you'll want to run queries like:

  • get all recipes with ingredient x
  • get all recipes in category x
  • get all recipes without ingredient x
  • get all recipes that take less than x minutes
  • get all recipes with ingredients (x, y, z)
  • get all recipes with difficulty x

As well as combinations of the above. See if your queries work; if they do, leave well enough alone. Seriously, don't "optimize" just in case. If the queries don't work, tune them. Learn about indexing, and free text search.

If that doesn't work, think about buying bigger hardware. In the long run, it's cheaper than exotic design solutions (like "one table for records up to 10000" - just imagine the data access logic for that...).

In very general terms, if your queries can use an index, they'll perform perfectly adequately joining several tables across many millions or records on modern affordable hardware. It's safe to assume that all your "ID" columns will be indexed, and that searching by index will be fast.

What's likely to be slow is searching on text values if you have to support wildcards (e.g "where ingredient like '%banana%'). That's where I'd put my energy - MySQL has support for free text searching, which is pretty quick.

看海 2025-01-07 23:56:05

我应该做一些聚类 - 比如 ids 1..10000 等的表

不。将数据拆分为单独的表是一个非常愚蠢的想法。 OTOH 将一张表分布在多个磁盘(甚至多个服务器)上对于性能来说是一个好主意。在 MySQL 上,这通常称为分区。但是使用 RAID-1 也有充分的理由

至于您当前的方法是否可扩展......这就像问一段字符串有多长。但是,如果您谈论关键字搜索,那么请忘记使用 MySQL FULLTEXT 索引并正确执行它。它可能可以满足您当前的数据集/查询要求 - 但其功能和可扩展性非常有限。

最多可获得 100,000

(我只是暂时停止处理在单个 SATA 磁盘上运行的 80,000,000 行的数据库,没有出现性能问题)

Shoudl I do some clusturing - like table for ids 1..10000 etc

No. Splitting your data into seperate tables is a really dumb idea. OTOH having one table distributed across multiple disks (or even multiple servers) is a good idea for performance. On MySQL this is usually referred to as partitioning. But there's also good reasons for using RAID-1

As to whether your current approach is scalable....that's like asking how long a piece of string is. But if your talking about keyword search then forget about using MySQL FULLTEXT indexes and do it properly. It may work OK with your curernt data set /query requirements - but its very limited in functionality and scalability.

can get up to 100,000

(I'm just taking a break from working on a database with 80,000,000 rows running on a single SATA disk with no performance issues)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文