当使用连接时，我们如何极大地优化（或替换）我们的 MySQL 数据库？

发布于 2024-09-05 15:10:01 字数 3940 浏览 13 评论 0原文

这是我第一次遇到交易量极高的情况。这是一个基于MySQL的广告服务器。然而，所使用的查询包含大量 JOIN，并且通常速度慢。（顺便说一句，这是 Rails ActiveRecord）

sel = Ads.find(:all, :select => '*', :joins => "在 ads.campaign_id = Campaign.id 上加入广告活动在campaigns.user_id = users.id 上加入用户左加入国家/地区ON states.campaign_id = Campaigns.id 左连接关键字 ON keywords.campaign_id = Campaigns.id", :conditions => [flashstr + "keywords.word = ? AND ads.format = ? .country IS NULL 或countries.country = ?) AND ads.enabled = 1 AND Campaigns.dailyenabled = 1 AND users.uenabled = 1", kw, format,viewer['country'][0]], :order => ；订单，：限制=>限制）

我的问题：

是否有像 MySQL 这样具有 JOIN 支持但速度更快的替代数据库？（我知道有 Postgre，仍在评估它。）
否则，会启动一个 MySQL 实例，将本地数据库加载到内存中并每 5 分钟重新加载一次有帮助吗？
否则，有什么办法可以将整个操作切换到 Redis 或 Cassandra，并以某种方式更改 JOIN 行为以匹配 NoSQL（不可 JOIN 的）性质？

谢谢你！

编辑：这里有更多详细信息：

使用扁平化选择完整执行的 SQL（上面截断）：

选择campaigns.id、campaigns.guid、campaigns.user_id、campaigns.dailylimit、campaigns.impressions、campaigns.cenabled、campaigns.dayspent、campaigns.dailyenabled、campaigns.fr、ads.id、ads.guid、广告。 user_id、ads.campaign_id、ads.format、ads.enabled、ads.datafile、ads.data1、ads.data2、ads.originalfilename、ads.aid、ads.impressions、country.id、country.guid、country.campaign_id、 country.country、keywords.id、keywords.campaign_id、keywords.word、keywords.bid FROM ads 在 ads.campaign_id = Campaigns.id 上加入广告活动在campaigns.user_id = users.id 上加入用户 LEFT JOIN国家/地区 ON 国家/地区.campaign_id = 营销活动.id 左连接关键字关键字/关键字.campaign_id = 营销活动.id WHERE (keywords.word = '设计' AND ads.format = 10 AND 营销活动.cenabled = 1 AND (countries.country IS NULL OR 国家/地区.country = 82) AND ads.enabled = 1 AND Campaigns.dailyenabled = 1 AND users.uenabled = 1 AND ads.datafile != '') ORDER BY keywords.bid DESC LIMIT 1,1

说明/执行计划：（

+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+
| id | select_type | table     | type   | possible_keys    | key         | key_len | ref                                | rows | Extra                                        |
+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | keywords  | ref    | campaign_id,word | word        | 257     | const                              |    9 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | ads       | ref    | campaign_id      | campaign_id | 4       | e_development.keywords.campaign_id |    8 | Using where                                  | 
|  1 | SIMPLE      | campaigns | eq_ref | PRIMARY          | PRIMARY     | 4       | e_development.keywords.campaign_id |    1 | Using where                                  | 
|  1 | SIMPLE      | users     | eq_ref | PRIMARY          | PRIMARY     | 4       | e_development.campaigns.user_id    |    1 | Using where                                  | 
|  1 | SIMPLE      | countries | ALL    | campaign_id      | NULL        | NULL    | NULL                               |    4 | Using where                                  | 
+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+

这是一项开发数据库，它的行数几乎没有生产版本那么多。）

定义索引：

ads -> id (primary, autoinc) + aid (unique) + campaign_id (index) + user_id (index)
campaigns -> id (primary, autoinc) + user_id (index)
countries -> id (primary, autoinc) + campaign_id (index) + country (index) + user_id (index)
keywords -> id (primary, autoinc) + campaign_id (index) + word (index) + user_id (index)
user -> id (primary, autoinc)

原文

This is the first time I'm approaching an extremely high-volume situation. This is an ad server based on MySQL. However, the query that is used incorporates a lot of JOINs and is generally just slow. (This is Rails ActiveRecord, btw)

sel = Ads.find(:all, :select => '*', :joins => "JOIN campaigns ON ads.campaign_id = campaigns.id JOIN users ON campaigns.user_id = users.id LEFT JOIN countries ON countries.campaign_id = campaigns.id LEFT JOIN keywords ON keywords.campaign_id = campaigns.id", :conditions => [flashstr + "keywords.word = ? AND ads.format = ? AND campaigns.cenabled = 1 AND (countries.country IS NULL OR countries.country = ?) AND ads.enabled = 1 AND campaigns.dailyenabled = 1 AND users.uenabled = 1", kw, format, viewer['country'][0]], :order => order, :limit => limit)

My questions:

Is there an alternative database like MySQL that has JOIN support, but is much faster? (I know there's Postgre, still evaluating it.)
Otherwise, would firing up a MySQL instance, loading a local database into memory and re-loading that every 5 minutes help?
Otherwise, is there any way I could switch this entire operation to Redis or Cassandra, and somehow change the JOIN behavior to match the (non-JOIN-able) nature of NoSQL?

Thank you!

EDIT: here are more details:

Full executed SQL with flattened select (truncated above):

SELECT campaigns.id, campaigns.guid, campaigns.user_id, campaigns.dailylimit, campaigns.impressions, campaigns.cenabled, campaigns.dayspent, campaigns.dailyenabled, campaigns.fr, ads.id, ads.guid, ads.user_id, ads.campaign_id, ads.format, ads.enabled, ads.datafile, ads.data1, ads.data2, ads.originalfilename, ads.aid, ads.impressions, countries.id, countries.guid, countries.campaign_id, countries.country, keywords.id, keywords.campaign_id, keywords.word, keywords.bid FROM ads JOIN campaigns ON ads.campaign_id = campaigns.id JOIN users ON campaigns.user_id = users.id LEFT JOIN countries ON countries.campaign_id = campaigns.id LEFT JOIN keywords ON keywords.campaign_id = campaigns.id WHERE (keywords.word = 'design' AND ads.format = 10 AND campaigns.cenabled = 1 AND (countries.country IS NULL OR countries.country = 82) AND ads.enabled = 1 AND campaigns.dailyenabled = 1 AND users.uenabled = 1 AND ads.datafile != '') ORDER BY keywords.bid DESC LIMIT 1,1

EXPLAIN/execution plan:

+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+
| id | select_type | table     | type   | possible_keys    | key         | key_len | ref                                | rows | Extra                                        |
+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | keywords  | ref    | campaign_id,word | word        | 257     | const                              |    9 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | ads       | ref    | campaign_id      | campaign_id | 4       | e_development.keywords.campaign_id |    8 | Using where                                  | 
|  1 | SIMPLE      | campaigns | eq_ref | PRIMARY          | PRIMARY     | 4       | e_development.keywords.campaign_id |    1 | Using where                                  | 
|  1 | SIMPLE      | users     | eq_ref | PRIMARY          | PRIMARY     | 4       | e_development.campaigns.user_id    |    1 | Using where                                  | 
|  1 | SIMPLE      | countries | ALL    | campaign_id      | NULL        | NULL    | NULL                               |    4 | Using where                                  | 
+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+

(this is on a development database, which doesn't have nearly as many rows as the production version.)

DEFINED INDICES:

ads -> id (primary, autoinc) + aid (unique) + campaign_id (index) + user_id (index)
campaigns -> id (primary, autoinc) + user_id (index)
countries -> id (primary, autoinc) + campaign_id (index) + country (index) + user_id (index)
keywords -> id (primary, autoinc) + campaign_id (index) + word (index) + user_id (index)
user -> id (primary, autoinc)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

忆依然 2024-09-12 15:10:01

数据库理论和名义实践的存在为大多数情况提供了框架。并非所有数据库使用模式都完全符合第三范式。于是NoSQL 就出现了。这些数据库在大多数情况下都不能很好地工作，但在特定情况下却可以很好地工作。它们工作良好的原因之一是它们不像普通的 RDBMS 那样工作。卡桑德拉确实有一些“加入”的设施，但我不记得确切的细节。如果您想快速了解，我会推荐 Digg 开发人员博客。有一个很好的简单描述。

问题是我敢打赌，连接 4 个表会比 mySQL 慢。唯一确定的方法是学习一个新的 DBMS，安装它，调整安装，以及调整 MySQL 并设置所有数据......你会发现 MySQL 做得非常好。

尝试使用不同的引擎以完全相同的方式解决完全相同的问题不会解决问题...您必须像 NoSQL 开发人员一样思考，而不是使用 NoSQL 的 RDBMS 开发人员。

但你可以按照沮丧建议来思考这个问题。

为什么我们有第三范式？主要是易于更新。我更新一行而不是几十行。它还有助于限制数据，如果我仔细控制国家/地区表中国家/地区的添加，我永远不会在活动表中得到不好的国家/地区。之后，3NF 并没有使查询变得更快，这就是我们发明报告数据库、OLAP、立方体、星型模式的原因。

关键是报告与编辑/捕获的结构不同。

正如沮丧所说，确定基础数据的变化速度。如果你真的每 5 分钟添加一个国家/地区，我会感到震惊。活动？可能偶尔？广告？每天几次。构建完全扁平化的表并为其建立索引需要多长时间？这会产生多少行？如果该周期时间比您的更新频率短得多...构建它并查看。测试查询速度。这是一个比购买全新数据库更便宜的实验。

回复收藏 0 原文