大于 10mm 记录的数据库的 MySQL 结构

发布于 2024-12-10 18:53:25 字数 399 浏览 0 评论 0原文

我正在使用一个应用程序,该应用程序有 3 个表,每个表都有超过 10 毫米的记录并且大于 2GB。

每次插入数据时,三个表中的每一个都会至少添加一条记录,甚至可能更多。 每次 INSERT 后都会启动一个脚本,该脚本会查询所有这些表,以提取与最后一个 INSERT 相关的数据(我们将其称为聚合脚本)。

将数据库划分为较小的单元并跨不同的服务器以便管理每个服务器的负载的最佳方法是什么?

备注:
1. 每秒插入次数超过 10 次,因此聚合脚本运行相同的次数。
2.聚合脚本占用资源
3. 必须对所有数据运行聚合脚本,才能找到与最后插入相关的数据
4.我还没有找到一种方法以某种方式将数据库划分为更小的单元
5.我对分布式数据库知之甚少,所以请使用非常基本的术语,并如果可能的话提供进一步阅读的链接

I am working with an application which has a 3 tables each with more than 10mm records and larger than 2GB.

Every time data is inserted there's at least one record added to each of the three tables and possibly more.
After every INSERT a script is launched which queries all these tables in order to extract data relevent to the last INSERT (let's call this the aggregation script).

What is the best way to divide the DB in smaller units and across different servers so that the load for each server is manageable?

Notes:
1. There are in excess of 10 inserts per second and hence the aggregation script is run the same number of times.
2. The aggregation script is resource intensive
3. The aggregation script has to be run on all the data in order to find which one is relevant to the last insert
4. I have not found a way of somehow dividing the DB into smaller units
5. I know very little about distributed DBs, so please use very basic terminology and provide links for further reading if possible

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

寻梦旅人 2024-12-17 18:53:25

从数据库的角度来看,这个问题有两个答案。

  1. 找到一种将数据库分解为更小的单元的方法。这很大程度上取决于您数据库的使用。这确实是您最好的选择,因为这是让数据库一次查看更少内容的唯一方法。这称为分片:
    http://en.wikipedia.org/wiki/Shard_(database_architecture)

  2. 在只读模式下拥有多个“从属”数据库。这些基本上是数据库的副本(有一点延迟)。对于任何可以接受延迟的只读查询,它们会通过整个站点中的代码访问这些数据库。这将减轻您正在查询的主数据库的一些负载,但是,对于任何特定查询来说,它仍然会占用大量资源。

从编程的角度来看,您已经 几乎拥有您的所有信息(除了 ids)。您可以尝试找到某种方法来使用该信息来满足您的所有需求,而不必在插入后重新查询数据库。您可以有一些进程只创建您首先查询的 id。假设您有表 A、B、C。您还有其他表,它们仅具有 A_ids、B_ids、C_ids 等主键。第一步,从 id 表中获取新的 id。第二步,插入A、B、C,同时做任何你想做的事情。

此外,还应审查所有查询的总体效率/性能。确保您正在查询的任何内容都有索引。对您正在运行的所有查询进行解释,以确保它们正在使用索引。

这确实是中级/高级 dba 需要做的事情。询问你的公司,让他们帮助你并教你。

There are two answers to this from a database point of view.

  1. Find a way of breaking up the database into smaller units. This is very dependent on the use of your database. This is really your best bet because it's the only way to get the database to look at less stuff at once. This is called sharding:
    http://en.wikipedia.org/wiki/Shard_(database_architecture)

  2. Have multiple "slave" databases in read only mode. These are basically copies of your database (with a little lag). For any read only queries where that lag is acceptable, they access these databases across the code in your entire site. This will take some load off of the master database you are querying. But, it will still be resource intensive on any particular query.

From a programming perspective, you already have nearly all your information (aside from ids). You could try to find some way of using that information for all your needs rather than having to requery the database after insert. You could have some process that only creates ids that you query first. Imagine you have tables A, B, C. You would have other tables that only have primary keys that are A_ids, B_ids, C_ids. Step one, get new ids from the id tables. Step two, insert into A, B, C and do whatever else you want to do at the same time.

Also, general efficiency/performance of all queries should be reviewed. Make sure you have indexes on anything you are querying. Do explain on all queries you are running to make sure they are using indexes.

This is really a midlevel/senior dba type of thing to do. Ask around your company and have them lend you a hand and teach you.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文