选择需要很长时间。如何解决这个问题呢?

发布于 2024-10-31 09:25:18 字数 3260 浏览 1 评论 0原文

我在 MYSQL 中有一个很大的库 - 300 mb,其中有 4 个表:第一个约 200mb,第二个约 80。 第一个表中有 150 000 条记录,第二个表中有 200 000 条记录。

同时我在那里使用内连接。

当我使用 optimization 和 indeces 时,Select 需要 3 秒(之前大约需要 20-30 秒)。 这已经是足够好的结果了。但我需要更多,因为页面加载需要 7-8 秒(3-4 用于选择,1 用于计数,另一个小查询 1 秒,1-2 用于页面生成)。

那么,我该怎么办呢?也许 postgres 比 mysql 花费的时间更少?或者使用内存缓存可能更好,但在这种情况下它可能会占用大量内存(排序的变体太多)。

也许有人有其他想法?我很高兴听到新的:)


好的。我发现我们需要查询:) 我重命名了 table_1 的字段。

     CREATE TABLE  `table_1` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `field` varchar(2048) DEFAULT NULL,
      `field` varchar(2048) DEFAULT NULL,
      `field` int(10) unsigned DEFAULT NULL,
      `field` text,
      `field` text,
      `field` text,
      `field` varchar(128) DEFAULT NULL,
      `field` text,
      `field` text,
      `field` text,
      `field` text,
      `field` text,
      `field` varchar(128) DEFAULT NULL,
      `field` text,
      `field` varchar(4000) DEFAULT NULL,
      `field` varchar(4000) DEFAULT NULL,
      `field` int(10) unsigned DEFAULT '1',
      `field` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      `field` text,
      `new` tinyint(1) NOT NULL DEFAULT '0',
      `applications` varchar(255) DEFAULT NULL,
      PRIMARY KEY (`id`),
      KEY `indexNA` (`new`,`applications`) USING BTREE
    ) ENGINE=InnoDB AUTO_INCREMENT=153235 DEFAULT CHARSET=utf8;

CREATE TABLE  `table_2` (
  `id_record` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `catalog_name` varchar(512) NOT NULL,
  `catalog_url` varchar(4000) NOT NULL,
  `parent_id` int(10) unsigned NOT NULL DEFAULT '0',
  `checked` tinyint(1) NOT NULL DEFAULT '0',
  `level` int(10) unsigned NOT NULL DEFAULT '0',
  `work` int(10) unsigned NOT NULL DEFAULT '0',
  `update` int(10) unsigned NOT NULL DEFAULT '1',
  `type` int(10) unsigned NOT NULL DEFAULT '0',
  `hierarchy` varchar(512) DEFAULT NULL,
  `synt` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id_record`,`type`) USING BTREE,
  KEY `rec` (`id_record`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=14504 DEFAULT CHARSET=utf8;

CREATE TABLE  `table_3` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `id_table_1` int(10) unsigned NOT NULL,
  `id_category` int(10) unsigned NOT NULL,
  `work` int(10) unsigned NOT NULL DEFAULT '1',
  `update` int(10) unsigned NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`),
  KEY `site` (`id_table_1`,`id_category`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=203844 DEFAULT CHARSET=utf8;

有查询的是: 1) 获取一般计数(需要不到 1 秒):

SELECT count(table_1.id) FROM table_1
 INNER JOIN table_3 ON table_3.id_table_id = table_1.id
 INNER JOIN table_2 ON table_2.id_record = table_3.id_category
WHERE ((table_2.type = 0)
AND (table_3.work = 1 AND table_2.work = 1)
 AND (table_1.new = 1))AND 1 IN (table_1.applications)

2) 获取有限制的页面列表(需要 3 到 7 秒,取决于计数):

SELECT table_1.field, table_1.field, table_1.field, table_1.field, table_2.catalog_name FROM table_1
 INNER JOIN table_3 ON table_3.id_table_id = table_1.id
 INNER JOIN table_2 ON table_2.id_record = table_3.id_category
WHERE ((table_2.type = 0)
AND (table_3.work = 1 AND table_2.work = 1)
 AND (table_1.new = 1))AND 1 IN (table_1.applications) LIMIT 10 OFFSET 10

I have a big base in MYSQL - 300 mb, where are 4 tables: the first one is about 200mb, the second is - 80.
There are 150 000 records in first table and 200 000 in second.

At the same time I use inner join there.

Select takes 3 seconds when I use optimization and indeces (before that it took about 20-30 seconds).
It is enough good result. But I need more, because page is loading for 7-8 seconds (3-4 for select, 1 for count, another small queries 1 sec, and 1-2 for page generation).

So, what I should do then? May be postgres takes less time than mysql? Or may be better to use memcaches, but in this case it can take lots of memory then (there are too many variants of sorting).

May be anybody has another idea? I would be glad to hear the new one:)


OK. I see we need queries:)
I renamed fields for table_1.

     CREATE TABLE  `table_1` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `field` varchar(2048) DEFAULT NULL,
      `field` varchar(2048) DEFAULT NULL,
      `field` int(10) unsigned DEFAULT NULL,
      `field` text,
      `field` text,
      `field` text,
      `field` varchar(128) DEFAULT NULL,
      `field` text,
      `field` text,
      `field` text,
      `field` text,
      `field` text,
      `field` varchar(128) DEFAULT NULL,
      `field` text,
      `field` varchar(4000) DEFAULT NULL,
      `field` varchar(4000) DEFAULT NULL,
      `field` int(10) unsigned DEFAULT '1',
      `field` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      `field` text,
      `new` tinyint(1) NOT NULL DEFAULT '0',
      `applications` varchar(255) DEFAULT NULL,
      PRIMARY KEY (`id`),
      KEY `indexNA` (`new`,`applications`) USING BTREE
    ) ENGINE=InnoDB AUTO_INCREMENT=153235 DEFAULT CHARSET=utf8;

CREATE TABLE  `table_2` (
  `id_record` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `catalog_name` varchar(512) NOT NULL,
  `catalog_url` varchar(4000) NOT NULL,
  `parent_id` int(10) unsigned NOT NULL DEFAULT '0',
  `checked` tinyint(1) NOT NULL DEFAULT '0',
  `level` int(10) unsigned NOT NULL DEFAULT '0',
  `work` int(10) unsigned NOT NULL DEFAULT '0',
  `update` int(10) unsigned NOT NULL DEFAULT '1',
  `type` int(10) unsigned NOT NULL DEFAULT '0',
  `hierarchy` varchar(512) DEFAULT NULL,
  `synt` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id_record`,`type`) USING BTREE,
  KEY `rec` (`id_record`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=14504 DEFAULT CHARSET=utf8;

CREATE TABLE  `table_3` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `id_table_1` int(10) unsigned NOT NULL,
  `id_category` int(10) unsigned NOT NULL,
  `work` int(10) unsigned NOT NULL DEFAULT '1',
  `update` int(10) unsigned NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`),
  KEY `site` (`id_table_1`,`id_category`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=203844 DEFAULT CHARSET=utf8;

There queries are:
1) get general count (takes less than 1 sec):

SELECT count(table_1.id) FROM table_1
 INNER JOIN table_3 ON table_3.id_table_id = table_1.id
 INNER JOIN table_2 ON table_2.id_record = table_3.id_category
WHERE ((table_2.type = 0)
AND (table_3.work = 1 AND table_2.work = 1)
 AND (table_1.new = 1))AND 1 IN (table_1.applications)

2) get list for page with limit (it takes from 3 to 7 seconds, depends on count):

SELECT table_1.field, table_1.field, table_1.field, table_1.field, table_2.catalog_name FROM table_1
 INNER JOIN table_3 ON table_3.id_table_id = table_1.id
 INNER JOIN table_2 ON table_2.id_record = table_3.id_category
WHERE ((table_2.type = 0)
AND (table_3.work = 1 AND table_2.work = 1)
 AND (table_1.new = 1))AND 1 IN (table_1.applications) LIMIT 10 OFFSET 10

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

孤云独去闲 2024-11-07 09:25:18

不要更改 DBMS

我不建议更改您的 DBMS,这可能会造成很大的破坏。如果您使用了与 Postgres 不兼容的 MySQL 特定查询;您可能需要重做整个索引等。即使如此,也可能无法保证性能的提高。

缓存是一个不错的选择

缓存确实是个好主意。它减轻了 DBMS 的负担。如果您的阅读量大、写作量少,那么它是最适合的。这样,对象将在缓存中保留更多时间。 MemcacheD 是非常好的缓存机制,而且非常简单。快速扩展的网站(例如 Facebook 等)大量使用 MemcacheD 来减轻数据库的负载。

如何大规模扩展

虽然,您没有非常大量的数据..所以很可能缓存会对您有所帮助。但缓存的下一步是基于 noSQL 的解决方案,例如 Cassandra。我们在一个应用程序中使用 cassandra,该应用程序有大量读写 (50:50) 操作,并且数据库非常大且增长很快。 Cassandra 提供了良好的性能。但是,我想在你的情况下,Cassandra 是一种矫枉过正

但是...

在深入研究任何重大变化之前,我建议您认真研究索引。尝试垂直缩放。研究慢速查询。 (搜索慢查询日志记录指令)。希望优化这些东西后 MySQL 会更快,并且您不需要额外的工具。

Do Not Change DBMS

I would not suggest to change your DBMS, it may be very disruptive. If you have used MySQL specific queries that are not compatible with Postgres; you might need to redo whole indexing etc. Even then it may not guarantee a performance improvement.

Caching is a Good Option

Caching is really good idea. It takes load off your DBMS. It is best suited if you have heavy read, light write. This way objects would stay more time in Cache. MemcacheD is really good caching mechanism, and is really simple. Rapidly scaling sites (like Facebook and the likes) make heavy use of MemcacheD to alleviate the load from database.

How to Scale-up Really Big Time

Although, you do not have very heavy data.. so most likely caching would help you. But the next step ahead of caching is noSQL based solutions like Cassandra. We use cassandra in one of our application where we have heavy read and write (50:50) operation and database is really large and fast growing. Cassandra gives good performance. But, I guess in your case, Cassandra is an overkill.

But...

Before, you dive into any serious changes, I would suggest to really look into indexes. Try scaling vertically. Look into slow queries. (Search for slow query logging directive). Hopefully, MySQL will be faster after optimizing these thing and you would not need additional tools.

朕就是辣么酷 2024-11-07 09:25:18

您应该研究特定于您使用的最频繁/最耗时的查询的索引。查看关于 mysql 索引的这篇文章

You should look into indexing specific to the most frequent/time consuming queries you use. Check this post on indexing for mysql.

临风闻羌笛 2024-11-07 09:25:18

除了其他人提供的所有其他建议之外,我还稍微做了一些改变,但对 MySQL 下的性能影响并不乐观。但是,我添加了 STRAIGHT_JOIN,因此优化器不会尝试为您考虑连接哪个订单或表。

接下来,我将“AND”条件移至表 2 和表 2 各自的 JOIN 子句中。 3.

最后,从表1到表3的连接(在你的帖子中)

   table_3.id_table_id = table_1.id

而不是

   table_3.id_table_1 = table_1.id

另外,我无法告诉性能,但也许在“新”列上有一个独立的索引,以便首先精确匹配,而不考虑“应用程序”栏。我不知道复合索引是否会导致问题,因为您在应用程序中使用“IN”,而不是真正的可索引搜索基础。

这是修改后的结果

SELECT STRAIGHT_JOIN 
        count(table_1.id) 
    FROM 
        table_1
            JOIN table_3 
                ON table_1.id = table_3.id_table_1
                   AND table_3.work = 1
                JOIN table_2 
                    ON table_3.id_category = table_2.id_record
                    AND table_2.type = 0
                    AND table_2.work = 1
    WHERE 
            table_1.new = 1
        AND 1 IN table_1.applications


SELECT STRAIGHT_JOIN 
        table_1.field, 
        table_1.field, 
        table_1.field, 
        table_1.field, 
        table_2.catalog_name 
    FROM 
        table_1
            JOIN table_3 
                ON table_1.id = table_3.id_table_1
                AND table_3.work = 1
                JOIN table_2 
                    ON table_3.id_category = table_2.id_record
                    AND table_2.type = 0
                    AND table_2.work = 1
    WHERE 
            table_1.new = 1
        AND 1 IN table_1.applications
    LIMIT 10 OFFSET 10

Aside from all the other suggestions others have offered, I've slightly altered and not positive of the performance impact under MySQL. However, I've added STRAIGHT_JOIN so the optimizer doesn't try to think which order or table to join FOR you.

Next, I moved the "AND" conditions into the respective JOIN clauses for tables 2 & 3.

Finally, the join from table 1 to 3 had (in your post)

   table_3.id_table_id = table_1.id

instead of

   table_3.id_table_1 = table_1.id

Additionally, I can't tell performance, but maybe having a stand-alone index on just the "new" column for exact match first without regards to the "applications" column. I don't know if the compound index is causing an issue since you are using an "IN" for the applications and not truly an indexable search basis.

Here's the modified results

SELECT STRAIGHT_JOIN 
        count(table_1.id) 
    FROM 
        table_1
            JOIN table_3 
                ON table_1.id = table_3.id_table_1
                   AND table_3.work = 1
                JOIN table_2 
                    ON table_3.id_category = table_2.id_record
                    AND table_2.type = 0
                    AND table_2.work = 1
    WHERE 
            table_1.new = 1
        AND 1 IN table_1.applications


SELECT STRAIGHT_JOIN 
        table_1.field, 
        table_1.field, 
        table_1.field, 
        table_1.field, 
        table_2.catalog_name 
    FROM 
        table_1
            JOIN table_3 
                ON table_1.id = table_3.id_table_1
                AND table_3.work = 1
                JOIN table_2 
                    ON table_3.id_category = table_2.id_record
                    AND table_2.type = 0
                    AND table_2.work = 1
    WHERE 
            table_1.new = 1
        AND 1 IN table_1.applications
    LIMIT 10 OFFSET 10
此生挚爱伱 2024-11-07 09:25:18

您还应该优化您的查询。

You should also optimize your query.

最终幸福 2024-11-07 09:25:18

如果不研究这些陈述,这个问题只能用理论方法来回答。只需考虑一些想法...

SELECT 语句...

首先,请确保您的查询尽可能“好”。有哪些你可能错过的事情吗?这些 indeces 是相同的字段类型吗?您能否缩小查询范围,从而减少数据库的工作量?

查询缓存...

如果您的查询经常重复,那么使用查询缓存 或者 - 如果您已经在使用它 - 给它更多的 RAM。

硬件...

当然,不同的 RDBMS 比其他 RDBMS 慢或快,具体取决于它们的优点或缺点,但如果您的查询被优化到遗忘,您只能在扩展数据库时获得更快的速度服务器(更好的CPU、更好的I/O等等,取决于瓶颈在哪里)。

其他因素...

如果这一切都已达到极限,也许可以尝试加快其他组件的速度(1-2 秒的页面生成对我来说看起来相当慢)。

对于所有提到的因素,stackoverflow.com 中有大量的想法和帖子。

Without a look into the statements this question can only be answered using theoretical approaches. Just a few ideas to take into consideration...

The SELECT-Statement...

First of all, make sure that your query is as "good" as it can be. Are there any indeces you might have missed? Are those indeces the same field types and so on? Can you perhaps narrow the query down so the database has less to work on?

The Query cache...

If your query is repeated pretty often, it might help to use the Query cache or - in case you're already using it - give it more RAM.

The Hardware...

Of course different RDBMS are slower or faster than others, depending on their strenght or weaknesses, but if your query is optimized into oblivion, you only can get it faster while scaling up the database server (better cpu, better i/o and so on, depending on where the bottleneck is).

Other Factors...

If this all is maxed out, maybe try speeding up the other components (1-2 secs for page generation looks pretty slow to me).

To all those factors mentioned there is a huge amount of ideas and posts in stackoverflow.com.

最佳男配角 2024-11-07 09:25:18

这实际上并不是一个很大的数据库,对于您的数据库系统来说当然也不算太大。作为比较,我们当前使用的数据库约为 40 GB。虽然它是 MS SQL Server,所以不能直接比较,但数据库系统之间没有显着差异。

我的猜测是,您在使用索引来加速查询方面尚未完全成功。您应该查看查询的执行计划,看看是否可以发现执行的哪一部分占用了大部分时间。

That is actually not such a big database, certainly not too much for your database system. As comparison, the database that we are using is currently around 40 GB. It's an MS SQL Server, though, so it's not directly comparable, but there is no dramatic difference between the database systems.

My guess is that you haven't been completely successful in using indexes to speed up the query. You should look at the execution plan for the query and see if you can spot what part of the execution that is taking most of the time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文