MYSQL 值差优化

发布于 2024-11-27 21:22:33 字数 1132 浏览 0 评论 0原文

大家好,

我正在运行一个非常大的数据库(ATM,比如超过 500 万个数据集)。我的数据库存储自定义生成的数字(这些数字以及它们的组成方式在这里并不重要)以及与该数字相对应的日期。此外,每个产品都会存储一个 ID(意味着一个产品可以在我的数据库中具有不同日期的多个条目 -> 主键已划分)。现在我想SELECT那些在过去两天内数字差异最大的前 10 个 ID。目前,我尝试使用 JOINS 来实现这一点,但由于我获得了那么多数据集,这种方式速度太慢了。我怎样才能加快整个操作的速度?

SELECT 
  d1.place,d2.place,d1.ID 
FROM 
  daily 
INNER JOIN 
  daily AS d1 ON d1.date = CURDATE() 
INNER JOIN 
  daily as d2 ON d2.date = DATE_ADD(CURDATE(), INTERVAL -1 DAY) 
ORDER BY 
  d2.code-d1.code LIMIT 10

编辑:这就是我的结构的样子,

CREATE TABLE IF NOT EXISTS `daily` (
 `ID` bigint(40) NOT NULL,
 `source` char(20) NOT NULL,
 `date` date NOT NULL,
 `code` int(11) NOT NULL,
 `cc` char(2) NOT NULL,
 PRIMARY KEY (`ID`,`source`,`date`,`cc`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

这就是 Explain 语句的输出

    id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  d1  ALL PRIMARY NULL    NULL    NULL    5150350 Using where; Using temporary; Using filesort
1   SIMPLE  d2  ref PRIMARY PRIMARY 8   mytable.d1.ID   52  Using where

Hallo guys,

I'm running a very large database (ATM like >5 Million datasets). My database stores custom generated numbers (which and how they compose doesn't really matters here) and the corresponding date to this one. In addition there is a ID stored for every product (means one product can have multiple entries for different dates in my database -> primary key is divided). Now I want to SELECT those top 10 ID's which got the largest difference in theire numbers in the last two days. Currently I tried to achieve this using JOINS but since I got that much datasets this way is far to slow. How could I speed up the whole operation?

SELECT 
  d1.place,d2.place,d1.ID 
FROM 
  daily 
INNER JOIN 
  daily AS d1 ON d1.date = CURDATE() 
INNER JOIN 
  daily as d2 ON d2.date = DATE_ADD(CURDATE(), INTERVAL -1 DAY) 
ORDER BY 
  d2.code-d1.code LIMIT 10

EDIT: Thats how my structure looks like

CREATE TABLE IF NOT EXISTS `daily` (
 `ID` bigint(40) NOT NULL,
 `source` char(20) NOT NULL,
 `date` date NOT NULL,
 `code` int(11) NOT NULL,
 `cc` char(2) NOT NULL,
 PRIMARY KEY (`ID`,`source`,`date`,`cc`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Thats the output of the Explain Statement

    id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  d1  ALL PRIMARY NULL    NULL    NULL    5150350 Using where; Using temporary; Using filesort
1   SIMPLE  d2  ref PRIMARY PRIMARY 8   mytable.d1.ID   52  Using where

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

满地尘埃落定 2024-12-04 21:22:33

这个怎么样?

SELECT
  d1.ID, d1.place, d2.place
FROM
  daily AS d1
CROSS JOIN
  daily AS d2
USING (ID)
WHERE
  d1.date = CURDATE()
  AND d2.date = CURDATE() - INTERVAL 1 DAY
ORDER BY
  d2.code - d1.code DESC
LIMIT
  10

关于表结构的一些想法。

`ID` bigint(40) NOT NULL,

为什么选择 BIGINT?您需要一年、每周 7 天、每天 24 小时每秒执行 136 次插入,才能耗尽 INT 的范围。在您完成一半之前,您的应用程序可能需要一位专业的 DBA。
请记住,较小的主索引会导致较长的查找 - 这让我们想到:

PRIMARY KEY (`ID`,`source`,`date`,`cc`)

为什么? ID 列上的单列 PK 应该足够了。如果您需要其他列上的索引,请创建其他索引(并且明智地创建它)。事实上,您基本上拥有整个表的覆盖索引......这就像索引中包含整个表一样。

最后但并非最不重要的一点是:place 列在哪里?你在你的查询中使用了它(然后我在我的查询中使用过它),但它却无处可见?

建议的表结构:

CREATE TABLE IF NOT EXISTS `daily` (
 `ID` int(10) UNSIGNED NOT NULL,  --usually AUTO_INCREMENT is used as well,
 `source` char(20) NOT NULL,
 `date` date NOT NULL,
 `code` int(11) NOT NULL,
 `cc` char(2) NOT NULL,
 PRIMARY KEY (`ID`),
 KEY `ID_date` (`ID`,`date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

How about this?

SELECT
  d1.ID, d1.place, d2.place
FROM
  daily AS d1
CROSS JOIN
  daily AS d2
USING (ID)
WHERE
  d1.date = CURDATE()
  AND d2.date = CURDATE() - INTERVAL 1 DAY
ORDER BY
  d2.code - d1.code DESC
LIMIT
  10

Some thoughts about your table structure.

`ID` bigint(40) NOT NULL,

Why BIGINT? You would need to be doing 136 inserts/s 24h a day, 7 days a week for a year to exhaust the range of INT. And before you get halfway there, your application will probably need a professional DBA anyway.
Remember, Smaller primary index leads to fater lookups - which brings us to:

PRIMARY KEY (`ID`,`source`,`date`,`cc`)

Why? A single column PK on ID column should be enough. If you need indexes on other columns, create additional indexes (and to it wisely). As it is, you basically have a covering index for entire table... which is like having entire table in the index.

Last but not least: where is place column? You've used it in your query (and then I in mine), but it's nowhere to be seen?

Proposed table structure:

CREATE TABLE IF NOT EXISTS `daily` (
 `ID` int(10) UNSIGNED NOT NULL,  --usually AUTO_INCREMENT is used as well,
 `source` char(20) NOT NULL,
 `date` date NOT NULL,
 `code` int(11) NOT NULL,
 `cc` char(2) NOT NULL,
 PRIMARY KEY (`ID`),
 KEY `ID_date` (`ID`,`date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文