MySQL：优化 6 个巨大的相同表的 SELECT，这些表具有按时间戳划分的不同数据

发布于 2024-09-18 01:12:06 字数 1369 浏览 5 评论 0原文

请我遇到与我在这里发现的相同的问题

我必须从许多具有相同结构但数据不同的MySQL表中选择数据（分为table_0、table_1、table_2等到table_5以分布数百万条记录）的数据）。

为每个设备生成数据记录的硬件根据时间戳字段从一个表移动到另一个表，该时间戳字段不是唯一的。例如，table_0 中的 50 条记录可能具有相同的时间戳。当数据到达table_5的末尾时，它返回table_0并开始覆盖那里的数据。我需要获取某个时间范围内每个设备上的数据。

每个表的数据列（table_0、table_1...直至table_5）：
timestamp、robotGroupID、robotID、sensor1、sensor2、sensor3，...（其中很多）

然而，这些表很大，并且 UNION ALL（我读它比 DISTINCT 更快）需要很长时间才能完成执行，即使只有两个表，更不用说 6 个表了。例如，我将在下面说明两个表。

PHP 中的 MySQL 语句：（仅针对传感器 1、传感器 2 和传感器 3 进行说明）
（从 table_0 中选择传感器 1、传感器 2、传感器 3，其中 robotsID=".$robotID." 以及“.$timeStampStart.”和“.$timeStampStop”之间的时间戳）联合所有（从 table_1 中选择传感器 1、传感器 2、传感器 3，其中 robotsID =“.$robotID。”以及“.$timeStampStart”和“.$timeStampStop”之间的时间戳）

注意，除了表名之外，它是完全相同的查询。机器人在某一时间范围内的传感器数据可能不跨越一个表，也可能同时跨越一个或多个表。

我不能使用LIMIT，因为无法提前知道每个时间范围内机器人的报告数量。我无法使用合并存储引擎，因为我只有公司数据库的只读访问权限。

我有一个想法，在运行查询之前在每个表上使用 count(robotID) 等进行检查，但我不确定如何进行此操作，因为我是个新手。

请问该怎么做您认为我可以使 6 个表和更多列的工作速度更快，因为列数比图示的多得多？提前致谢！

原文

please I have the same problem as I found here

MySQL - Selecting data from multiple tables all with same structure but different data ,

I have to select data from many MySQL tables with identical structure, but different data (split up into table_0, table_1, table_2 etc to table_5 to distribute millions of records of data).

The hardware generating the data records for each device moves from table to table according to timestamp field, which is NOT unique. e.g. 50 records in table_0 may have the same timestamp. When the data gets to the end of table_5, it goes back to table_0 to start overwriting the data there. I need to get the data on each device within a time range.

Each table's data columns (for table_0, table_1... up to table_5):
timestamp, robotGroupID, robotID, sensor1, sensor2, sensor3, ... (many of them)

However the tables are HUGE and the UNION ALL (I read its faster than DISTINCT) takes forever to execute, even with just two tables let alone 6. e.g. I will illustrate for two tables below.

MySQL statement in PHP: (illustrated for just sensor 1, sensor 2 and sensor 3)
(SELECT sensor1, sensor2, sensor3 FROM table_0 WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop) UNION ALL (SELECT sensor1, sensor2, sensor3 FROM table_1 WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop)

N.B it is the exact same query except for the table name. Sensor data for a robot within a time range may span none, one, or more of the tables at once.

I cannot use LIMIT because the number of reports from robots within each time range cannot be known ahead of time. I cant use the MERGE STORAGE ENGINE cos I only have read-only access to the company's database.

I have an idea to use count(robotID) or so on each table to check before running queries but Im not sure how to go about this cos I'm quite a novice.

Please how do you think I can make this work faster for 6 tables and many more columns since there are many more columns than illustrated? Thanks in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓬勃野心 2024-09-25 01:12:06

RobotID 和 Timestamp 字段是否已建立索引？

我至少会添加一个多字段索引（RobotId，时间戳）。

您说您对表具有只读访问权限，那么您可以请求添加该索引吗？我相信它会对您发布的原始查询和更新后的查询有所帮助。

回复收藏 0 原文

寄居者 2024-09-25 01:12:06

我必须承认我仍然是一个 PHP/MySQL 编码新手，但有很多想法；所以我的代码可能是“脏的”。

所以我以这种方式解决了问题，以便继续前进，但欢迎更好的解决方案。至于任何奇怪的语法，我使用的是基于 PHP PDO 构建的数据库类，因为我在这个项目中使用了许多不同的 RBDMS 类型。

对于 $myQuery_start 变量，我添加了其他列以及传感器 1 到 3 的名称。
<代码>
$myQuery_start = "(从"中选择传感器1、传感器2、传感器3";
$myQueryCount_start = "(选择 COUNT(*) FROM ";
$myQuery_stop = " WHERE robotsID=".$robotID." AND 时间戳位于 ".$timeStampStart." 和 ".$timeStampStop.")";

<代码>
$count_0 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_0".$myQuery_stop)->fetchColumn();
$count_1 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_1".$myQuery_stop)->fetchColumn();
$count_2 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_2".$myQuery_stop)->fetchColumn();
$count_3 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_3".$myQuery_stop)->fetchColumn();
$count_4 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_4".$myQuery_stop)->fetchColumn();
$count_5 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_5".$myQuery_stop)->fetchColumn();

现在我检查是否需要将 UNION ALL 附加到每个表的查询中。如果下一个表中没有要附加的数据记录，则无需使用 UNION ALL。

<代码>
$union_0 = (($count_1 + $count_2 + $count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_1 = (($count_2 + $count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_2 = (($count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_3 = (($count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_4 = (($count_5) > 0)?" UNION ALL ":"";

现在我们构建表查询并组合起来形成完整的查询

<代码>
$query_0 = ($count_0 > 0)?$myQuery_start."ip_minute_stats_0".$myQuery_stop.$union_0:"";
$query_1 = ($count_1 > 0)?$myQuery_start."ip_minute_stats_1".$myQuery_stop.$union_1:"";
$query_2 = ($count_2 > 0)?$myQuery_start."ip_minute_stats_2".$myQuery_stop.$union_2:"";
$query_3 = ($count_3 > 0)?$myQuery_start."ip_minute_stats_3".$myQuery_stop.$union_3:"";
$query_4 = ($count_4 > 0)?$myQuery_start."ip_minute_stats_4".$myQuery_stop.$union_4:"";
$query_5 = ($count_5 > 0)?$myQuery_start."ip_minute_stats_5".$myQuery_stop:"";

然后串联起来：

<代码>
$myQuery = $query_0.$query_1.$query_2.$query_3.$query_4.$query_5;

最后执行 $myQuery 以生成所需的所有数据。

至少这比我之前使用 UNION ALL 的方式快大约 8 倍，所以我认为这是有效的。有进一步优化的建议吗？

I must confess Im still a novice PHP/MySQL coder, but with many ideas; so my code is probably "dirty".

So I solved the problem this way in order to move forward, but please better solutions are welcome. As for any strange syntax, I am using a database class built upon the PHP PDO because I am using many different RBDMS types on this project.

For the $myQuery_start variable, I added the names of the other columns as well as sensors 1 to 3.
$myQuery_start = "(SELECT sensor1, sensor2, sensor3 FROM "; $myQueryCount_start = "(SELECT COUNT(*) FROM "; $myQuery_stop = " WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop.")";

$count_0 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_0".$myQuery_stop)->fetchColumn(); $count_1 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_1".$myQuery_stop)->fetchColumn(); $count_2 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_2".$myQuery_stop)->fetchColumn(); $count_3 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_3".$myQuery_stop)->fetchColumn(); $count_4 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_4".$myQuery_stop)->fetchColumn(); $count_5 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_5".$myQuery_stop)->fetchColumn();

And now I check to see if UNION ALL needs to be appended to each table's query or not. No need to have a UNION ALL if there is no data record to attach in the next table.

$union_0 = (($count_1 + $count_2 + $count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":""; $union_1 = (($count_2 + $count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":""; $union_2 = (($count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":""; $union_3 = (($count_4 + $count_5) > 0)?" UNION ALL ":""; $union_4 = (($count_5) > 0)?" UNION ALL ":"";

and now we build up the table queries and combine to form the full query

$query_0 = ($count_0 > 0)?$myQuery_start."ip_minute_stats_0".$myQuery_stop.$union_0:""; $query_1 = ($count_1 > 0)?$myQuery_start."ip_minute_stats_1".$myQuery_stop.$union_1:""; $query_2 = ($count_2 > 0)?$myQuery_start."ip_minute_stats_2".$myQuery_stop.$union_2:""; $query_3 = ($count_3 > 0)?$myQuery_start."ip_minute_stats_3".$myQuery_stop.$union_3:""; $query_4 = ($count_4 > 0)?$myQuery_start."ip_minute_stats_4".$myQuery_stop.$union_4:""; $query_5 = ($count_5 > 0)?$myQuery_start."ip_minute_stats_5".$myQuery_stop:"";

Then concatenated:

$myQuery = $query_0.$query_1.$query_2.$query_3.$query_4.$query_5;

And finally $myQuery is executed to produce all the data as required.

At least this is roughly 8 times faster than the previous way I used UNION ALL, so I think this is valid. Any suggested further optimization?

回复收藏 0 原文