当使用连接时,我们如何极大地优化(或替换)我们的 MySQL 数据库?

发布于 2024-09-05 15:10:01 字数 3940 浏览 6 评论 0原文

这是我第一次遇到交易量极高的情况。这是一个基于MySQL的广告服务器。然而,所使用的查询包含大量 JOIN,并且通常速度。 (顺便说一句,这是 Rails ActiveRecord)

sel = Ads.find(:all, :select => '*', :joins => "在 ads.campaign_id = Campaign.id 上加入广告活动 在campaigns.user_id = users.id 上加入用户 左加入国家/地区ON states.campaign_id = Campaigns.id 左连接关键字 ON keywords.campaign_id = Campaigns.id", :conditions => [flashstr + "keywords.word = ? AND ads.format = ? .country IS NULL 或countries.country = ?) AND ads.enabled = 1 AND Campaigns.dailyenabled = 1 AND users.uenabled = 1", kw, format,viewer['country'][0]], :order => ;订单,:限制=>限制)

我的问题:

  1. 是否有像 MySQL 这样具有 JOIN 支持但速度更快的替代数据库? (我知道有 Postgre,仍在评估它。)

  2. 否则,会启动一个 MySQL 实例,将本地数据库加载到内存中并每 5 分钟重新加载一次有帮助吗?

  3. 否则,有什么办法可以将整个操作切换到 Redis 或 Cassandra,并以某种方式更改 JOIN 行为以匹配 NoSQL(不可 JOIN 的)性质?

谢谢你!


编辑:这里有更多详细信息:

使用扁平化选择完整执行的 SQL(上面截断):

选择campaigns.id、campaigns.guid、campaigns.user_id、campaigns.dailylimit、campaigns.impressions、campaigns.cenabled、campaigns.dayspent、campaigns.dailyenabled、campaigns.fr、ads.id、ads.guid、广告。 user_id、ads.campaign_id、ads.format、ads.enabled、ads.datafile、ads.data1、ads.data2、ads.originalfilename、ads.aid、ads.impressions、country.id、country.guid、country.campaign_id、 country.country、keywords.id、keywords.campaign_id、keywords.word、keywords.bid FROM ads 在 ads.campaign_id = Campaigns.id 上加入广告活动 在campaigns.user_id = users.id 上加入用户 LEFT JOIN国家/地区 ON 国家/地区.campaign_id = 营销活动.id 左连接关键字 关键字/关键字.campaign_id = 营销活动.id WHERE (keywords.word = '设计' AND ads.format = 10 AND 营销活动.cenabled = 1 AND (countries.country IS NULL OR 国家/地区.country = 82) AND ads.enabled = 1 AND Campaigns.dailyenabled = 1 AND users.uenabled = 1 AND ads.datafile != '') ORDER BY keywords.bid DESC LIMIT 1,1

说明/执行计划:(

+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+
| id | select_type | table     | type   | possible_keys    | key         | key_len | ref                                | rows | Extra                                        |
+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | keywords  | ref    | campaign_id,word | word        | 257     | const                              |    9 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | ads       | ref    | campaign_id      | campaign_id | 4       | e_development.keywords.campaign_id |    8 | Using where                                  | 
|  1 | SIMPLE      | campaigns | eq_ref | PRIMARY          | PRIMARY     | 4       | e_development.keywords.campaign_id |    1 | Using where                                  | 
|  1 | SIMPLE      | users     | eq_ref | PRIMARY          | PRIMARY     | 4       | e_development.campaigns.user_id    |    1 | Using where                                  | 
|  1 | SIMPLE      | countries | ALL    | campaign_id      | NULL        | NULL    | NULL                               |    4 | Using where                                  | 
+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+

这是一项开发数据库,它的行数几乎没有生产版本那么多。)

定义索引:

ads -> id (primary, autoinc) + aid (unique) + campaign_id (index) + user_id (index)
campaigns -> id (primary, autoinc) + user_id (index)
countries -> id (primary, autoinc) + campaign_id (index) + country (index) + user_id (index)
keywords -> id (primary, autoinc) + campaign_id (index) + word (index) + user_id (index)
user -> id (primary, autoinc)

This is the first time I'm approaching an extremely high-volume situation. This is an ad server based on MySQL. However, the query that is used incorporates a lot of JOINs and is generally just slow. (This is Rails ActiveRecord, btw)

sel = Ads.find(:all, :select => '*', :joins => "JOIN campaigns ON ads.campaign_id = campaigns.id JOIN users ON campaigns.user_id = users.id LEFT JOIN countries ON countries.campaign_id = campaigns.id LEFT JOIN keywords ON keywords.campaign_id = campaigns.id", :conditions => [flashstr + "keywords.word = ? AND ads.format = ? AND campaigns.cenabled = 1 AND (countries.country IS NULL OR countries.country = ?) AND ads.enabled = 1 AND campaigns.dailyenabled = 1 AND users.uenabled = 1", kw, format, viewer['country'][0]], :order => order, :limit => limit)

My questions:

  1. Is there an alternative database like MySQL that has JOIN support, but is much faster? (I know there's Postgre, still evaluating it.)

  2. Otherwise, would firing up a MySQL instance, loading a local database into memory and re-loading that every 5 minutes help?

  3. Otherwise, is there any way I could switch this entire operation to Redis or Cassandra, and somehow change the JOIN behavior to match the (non-JOIN-able) nature of NoSQL?

Thank you!


EDIT: here are more details:

Full executed SQL with flattened select (truncated above):

SELECT campaigns.id, campaigns.guid, campaigns.user_id, campaigns.dailylimit, campaigns.impressions, campaigns.cenabled, campaigns.dayspent, campaigns.dailyenabled, campaigns.fr, ads.id, ads.guid, ads.user_id, ads.campaign_id, ads.format, ads.enabled, ads.datafile, ads.data1, ads.data2, ads.originalfilename, ads.aid, ads.impressions, countries.id, countries.guid, countries.campaign_id, countries.country, keywords.id, keywords.campaign_id, keywords.word, keywords.bid FROM ads JOIN campaigns ON ads.campaign_id = campaigns.id JOIN users ON campaigns.user_id = users.id LEFT JOIN countries ON countries.campaign_id = campaigns.id LEFT JOIN keywords ON keywords.campaign_id = campaigns.id WHERE (keywords.word = 'design' AND ads.format = 10 AND campaigns.cenabled = 1 AND (countries.country IS NULL OR countries.country = 82) AND ads.enabled = 1 AND campaigns.dailyenabled = 1 AND users.uenabled = 1 AND ads.datafile != '') ORDER BY keywords.bid DESC LIMIT 1,1

EXPLAIN/execution plan:

+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+
| id | select_type | table     | type   | possible_keys    | key         | key_len | ref                                | rows | Extra                                        |
+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | keywords  | ref    | campaign_id,word | word        | 257     | const                              |    9 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | ads       | ref    | campaign_id      | campaign_id | 4       | e_development.keywords.campaign_id |    8 | Using where                                  | 
|  1 | SIMPLE      | campaigns | eq_ref | PRIMARY          | PRIMARY     | 4       | e_development.keywords.campaign_id |    1 | Using where                                  | 
|  1 | SIMPLE      | users     | eq_ref | PRIMARY          | PRIMARY     | 4       | e_development.campaigns.user_id    |    1 | Using where                                  | 
|  1 | SIMPLE      | countries | ALL    | campaign_id      | NULL        | NULL    | NULL                               |    4 | Using where                                  | 
+----+-------------+-----------+--------+------------------+-------------+---------+------------------------------------+------+----------------------------------------------+

(this is on a development database, which doesn't have nearly as many rows as the production version.)

DEFINED INDICES:

ads -> id (primary, autoinc) + aid (unique) + campaign_id (index) + user_id (index)
campaigns -> id (primary, autoinc) + user_id (index)
countries -> id (primary, autoinc) + campaign_id (index) + country (index) + user_id (index)
keywords -> id (primary, autoinc) + campaign_id (index) + word (index) + user_id (index)
user -> id (primary, autoinc)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

忆依然 2024-09-12 15:10:01

数据库理论和名义实践的存在为大多数情况提供了框架。并非所有数据库使用模式都完全符合第三范式。于是NoSQL 就出现了。这些数据库在大多数情况下都不能很好地工作,但在特定情况下却可以很好地工作。它们工作良好的原因之一是它们不像普通的 RDBMS 那样工作。卡桑德拉确实有一些“加入”的设施,但我不记得确切的细节。如果您想快速了解,我会推荐 Digg 开发人员博客。有一个很好的简单描述。

问题是我敢打赌,连接 4 个表会比 mySQL 慢。唯一确定的方法是学习一个新的 DBMS,安装它,调整安装,以及调整 MySQL 并设置所有数据......你会发现 MySQL 做得非常好。

尝试使用不同的引擎以完全相同的方式解决完全相同的问题不会解决问题...您必须像 NoSQL 开发人员一样思考,而不是使用 NoSQL 的 RDBMS 开发人员。

但你可以按照沮丧建议来思考这个问题。

为什么我们有第三范式?主要是易于更新。我更新一行而不是几十行。它还有助于限制数据,如果我仔细控制国家/地区表中国家/地区的添加,我永远不会在活动表中得到不好的国家/地区。之后,3NF 并没有使查询变得更快,这就是我们发明报告数据库、OLAP、立方体、星型模式的原因。

关键是报告与编辑/捕获的结构不同。

正如沮丧所说,确定基础数据的变化速度。如果你真的每 5 分钟添加一个国家/地区,我会感到震惊。活动?可能偶尔?广告?每天几次。构建完全扁平化的表并为其建立索引需要多长时间?这会产生多少行?如果该周期时间比您的更新频率短得多...构建它并查看。测试查询速度。这是一个比购买全新数据库更便宜的实验。

Database theory and the nominal practice exist to provide a framework for a majority of cases. Not every database usage pattern fits neatly into 3rd normal form. Hence the emergence of NoSQL. These database don't work well in a majority of cases but do work great in specific cases. One reason they work well is because they DON'T work like a normal RDBMS. Cassandra does have some facility for 'joining' but I don't remember the exact details. If you want a quick understanding I'd recommend the Digg developers blog. There's a nice simple description.

The problem is that I'll bet you a pickle that joining 4 tables would be slower than mySQL. And the only way to know for sure would be learning a new DBMS, installing it, tuning the install as well as you can tune MySQL and setting up all your data and .... you'll like find out MySQL does pretty damn good.

Trying to solve the EXACT SAME problem the EXACT SAME way with a different engine won't cut it... you have to THINK like a NoSQL developer, not a RDBMS developer using NoSQL.

But you can think about the problem as Frustrated suggests.

Why do we have Third Normal Form? Ease of Update mainly. I update one row instead of dozens. It also helps constrain data, if I carefully control addition of countries in the country table, I'll never get a bad one in the campaign table. After that, 3NF doesn't make querying faster, which is why we invented reporting databases, OLAP, Cubes, Star Schemas.

the Key is that it's a different structure for reporting vs editing/capturing.

As Frustrated said, determine the speed of change in your underlying data. If you're really adding countries every 5 minutes, I'll be stunned. Campaigns? probably occasional? Ads? a couple times a day. How long would it take to build a fully flattened table and index it? How many rows does that produce? if that cycle time is much shorter than your update frequency... build that and see. Test the query speed. That's a cheaper experiment than going for a whole new DB.

街道布景 2024-09-12 15:10:01

你分析过你的执行计划吗?您分析过您的指数吗?

我的第一个猜测是,您需要在 campaigns 上为 user_id 建立索引,在 countries 上为 campaign_id 建立索引,在campaign_id 上的keywords...也许还有其他。您需要获取执行计划才能查看查询正在执行的操作。

另一个选项:此结果集中的数据多久更改一次?按分钟计算?小时?天?如果是每天或每小时(好吧,几个小时),那么最好有一个辅助表,其中包含此结果集的所有列(或只是不太可能经常更改的列)并由此查询填充每n小时。然后您的应用程序将只查询辅助表(或者可能与一个经常更改数据的表连接),这样可能会更快。

Have you analysed your execution plan? Have you analysed your indices?

My first guess would be that you need an index on campaigns for user_id, index on countries for campaign_id, on keywords on campaign_id... maybe others. You need to get an execution plan to see what your query is doing.

The other option: How often does the data in this result set change? By the minute? Hour? Day? If it's daily or hourly (well, several hours), it might be better to have a secondary table that contains ALL the columns (or just the columns that aren't likely to change frequently) of this result set and is populated by this query every n hours. Then your app would just query the secondary table (or maybe join with one table that has frequently changing data), it could be faster that way.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文