优化查询选择时间段

发布于 2024-07-13 18:40:50 字数 528 浏览 7 评论 0原文

给出下表:

Table events
id
start_time
end_time

有没有办法快速查找一个常数?

例如

SELECT *
FROM events
WHERE start_time<='2009-02-18 16:27:12' 
AND     end_time>='2009-02-18 16:27:12'

我正在使用MySQL。 在任一字段上建立索引仍然需要检查范围。 此外,两个字段上的索引不会产生影响(仅使用第一个字段)。

我可以将字段/索引添加到表中(因此添加包含这两个字段信息的索引构造字段是可以接受的)。

PS对此的需要来自这个问题:Optimize SQL that use Between Clause

Given the following table:

Table events
id
start_time
end_time

Is there a way to quickly search for a constant?

E.g.

SELECT *
FROM events
WHERE start_time<='2009-02-18 16:27:12' 
AND     end_time>='2009-02-18 16:27:12'

I am using MySQL. Having an index on either field still has to check a range. Moreover an index on both fields will not make a difference (only the first will be used).

I can add fields / indexes to the table (so adding an indexed constructed field containing the info of both fields would be acceptable).

P.S. The need for this came from this question: Optimize SQL that uses between clause

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

把梦留给海 2024-07-20 18:40:50

我的解决方案有一个警告:

1) 该解决方案的警告是您必须对事件表使用 MyISAM 引擎。 如果您无法使用 MyISAM,则此解决方案将不起作用,因为空间索引仅支持 MyISAM。

因此,假设上述内容对您来说不是问题,那么以下内容应该可以工作并为您提供良好的性能:

此解决方案利用 MySQL 对空间数据的支持(请参阅 此处的文档)。 虽然空间数据类型可以添加到各种存储引擎中,但空间 R 树索引仅支持 MyISAM(请参阅 此处的文档),这是获得所需性能所必需的。 另一个限制是空间数据类型仅适用于数字数据,因此您不能将此技术用于基于字符串的范围查询。

我不会详细介绍空间类型如何工作以及空间索引如何有用的理论细节,但您应该查看 Jeremy Cole 的解释 关于如何使用 GeoIP 的空间数据类型和索引查找。 如果您需要原始性能并且可以放弃一些准确性,还要查看评论,因为它们提出了一些有用的观点和替代方案。

基本前提是我们可以采用起点/终点并使用它们中的两个来创建四个不同的点,一个点对应 xy 网格上以 0,0 为中心的矩形的每个角,然后快速查找空间索引来确定我们关心的特定时间点是否在矩形内。 如前所述,请参阅 Jeremy Cole 的解释,以更全面地了解其工作原理。

在您的特定情况下,我们需要执行以下操作:

1)将表更改为 MyISAM 表(请注意,除非您完全意识到此类更改的后果(例如缺少事务和表),否则不应该这样做与 MyISAM 相关的锁定行为)。

alter table events engine = MyISAM;

2) 接下来我们添加将保存空间数据的新列。 我们将使用多边形数据类型,因为我们需要能够保存完整的矩形。

alter table events add column time_poly polygon NOT NULL;

3) 接下来,我们用数据填充新列(请记住,任何更新或插入表事件的进程都需要进行修改,以确保它们也填充新列)。 由于开始和结束范围是时间,我们需要使用 unix_timestamp 函数将它们转换为数字(请参见 此处的文档了解其工作原理)。

update events set time_poly := LINESTRINGFROMWKB(LINESTRING(
    POINT(unix_timestamp(start_time), -1),
    POINT(unix_timestamp(end_time), -1),
    POINT(unix_timestamp(end_time), 1),
    POINT(unix_timestamp(start_time), 1),
    POINT(unix_timestamp(start_time), -1)
  ));

4)接下来我们将空间索引添加到表中(如前所述,这仅适用于MyISAM表,并且会产生错误“ERROR 1464(HY000):使用的表类型不支持SPATIAL索引”)。

alter table events add SPATIAL KEY `IXs_time_poly` (`time_poly`);

5) 接下来,您将需要使用以下选择,以便在查询数据时使用空间索引。

SELECT * 
FROM events force index (IXs_time_poly)
WHERE MBRCONTAINS(events.time_poly, POINTFROMWKB(POINT(unix_timestamp('2009-02-18 16:27:12'), 0)));

强制索引是为了 100% 确保 MySQL 将使用该索引进行查找。 如果一切顺利,在上面的 select 上运行解释应该显示类似以下内容:

mysql> explain SELECT *
    -> FROM events force index (IXs_time_poly)
    -> on MBRCONTAINS(events.time_poly, POINTFROMWKB(POINT(unix_timestamp('2009-02-18 16:27:12'), 0)));
+----+-------------+-------+-------+---------------+---------------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key           | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+-------------+
|  1 | SIMPLE      | B     | range | IXs_time_poly | IXs_time_poly | 32      | NULL |    1 | Using where | 
+----+-------------+-------+-------+---------------+---------------+---------+------+------+-------------+
1 row in set (0.00 sec)

请参阅 Jeremy Cole 的分析,详细了解此方法与 Between 子句相比的性能优势。

如果您有任何疑问,请告诉我。

谢谢,

-迪平

There is one caveat to my solution:

1) The caveat to this solution is that you must be using the MyISAM engine for the events table. If you cannot use MyISAM then this solution wont work because only MyISAM is supported for Spatial Indexes.

So, assuming that the above isn't an issue for you, the following should work and give you good performance:

This solution makes use of MySQL's support for Spatial Data (see documentation here). While spatial data types can be added to a variety of storage engines, only MyISAM is supported for Spatial R-Tree Indexes (see documentation here) which are needed in order to get the performance needed. One other limitation is that spatial data types only work with numerical data so you cannot use this technique with string based range queries.

I wont go into the details of the theory behind how spatial types work and how the spatial index is useful but you should look at Jeremy Cole's explanation here in regards to how to use spatial data types and indexes for GeoIP lookups. Also look at the comments as they raise some useful points and alternative if you need raw performance and can give up some accuracy.

The basic premise is that we can take the start/end and use the two of them to create four distinct points, one for each corner of a rectangle centered around 0,0 on a xy grid, and then do a quick lookup into the spatial index to determine if the particular point in time we care about is within the rectangle or not. As mentioned previously, see Jeremy Cole's explanation for a more thorough overview of how this works.

In your particular case we will need to do the following:

1) Alter the table to be a MyISAM table (note you shouldn't do this unless you are fully aware of the consequences of such a change like the lack of transactions and the table locking behavior that are associated with MyISAM).

alter table events engine = MyISAM;

2) Next we add the new column that will hold the spatial data. We will use the polygon data type as we need to be able to hold a full rectangle.

alter table events add column time_poly polygon NOT NULL;

3) Next we populate the new column with the data (please keep in mind that any processes that update or insert into table events will need to get modified to make sure they are populating the new column also). Since the start and end ranges are times, we will need to convert them to numbers with the unix_timestamp function (see documentation here for how it works).

update events set time_poly := LINESTRINGFROMWKB(LINESTRING(
    POINT(unix_timestamp(start_time), -1),
    POINT(unix_timestamp(end_time), -1),
    POINT(unix_timestamp(end_time), 1),
    POINT(unix_timestamp(start_time), 1),
    POINT(unix_timestamp(start_time), -1)
  ));

4) Next we add the spatial index to the table (as mentioned previously, this will only work for a MyISAM table and will produce the error "ERROR 1464 (HY000): The used table type doesn't support SPATIAL indexes").

alter table events add SPATIAL KEY `IXs_time_poly` (`time_poly`);

5) Next you will need to use the following select in order to make use of the spatial index when querying the data.

SELECT * 
FROM events force index (IXs_time_poly)
WHERE MBRCONTAINS(events.time_poly, POINTFROMWKB(POINT(unix_timestamp('2009-02-18 16:27:12'), 0)));

The force index is there to make 100% sure that MySQL will use the index for the lookup. If everything went well running an explain on the above select should show something similar to the following:

mysql> explain SELECT *
    -> FROM events force index (IXs_time_poly)
    -> on MBRCONTAINS(events.time_poly, POINTFROMWKB(POINT(unix_timestamp('2009-02-18 16:27:12'), 0)));
+----+-------------+-------+-------+---------------+---------------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key           | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+-------------+
|  1 | SIMPLE      | B     | range | IXs_time_poly | IXs_time_poly | 32      | NULL |    1 | Using where | 
+----+-------------+-------+-------+---------------+---------------+---------+------+------+-------------+
1 row in set (0.00 sec)

Please refer to Jeremy Cole's analysis for details about the performance benefits of this method as compared with a between clause.

Let me know if you have any questions.

Thanks,

-Dipin

我最亲爱的 2024-07-20 18:40:50

MySQL 中没有有效的方法来精确执行此查询。

不过,如果您的范围不重叠,您可以仅使用 start_time <= const 以及 ORDER BY start_time DESC LIMIT 1 并进一步检查 end_time > = 常量

您需要在函数中执行此操作,因为 MySQL 由于某种原因不会在子查询中对 ORDER BY 使用 INDEX RANGE SCAN如果范围条件取自超级查询。

CREATE UNIQUE INDEX ux_b_start ON b (start_date);

CREATE FUNCTION `fn_get_last_b`(event_date TIMESTAMP) RETURNS int(11)
BEGIN
  DECLARE id INT;
  SELECT b.id
  INTO id
  FROM b
  FORCE INDEX (ux_b_start)
  WHERE b.start_time <= event_date
  ORDER BY
    b.start_time DESC
  LIMIT 1;
  RETURN id;
END;

SELECT COUNT(*) FROM a;

1000


SELECT COUNT(*) FROM b;

200000

SELECT *
FROM (
  SELECT fn_get_last_b(a.event_time) AS bid,
         a.*
  FROM a
) ao, b FORCE INDEX (PRIMARY)
WHERE b.id = ao.bid
  AND b.end_time >= ao.event_time

1000 rows fetched in 0,0143s (0,1279s)

There is no efficient way to do exactly this query in MySQL.

If your ranges do not overlap, though, you can just use start_time <= const along with ORDER BY start_time DESC LIMIT 1 and further checking for end_time >= const.

You'll need to do it in a function, as MySQL for some reason doesn't use INDEX RANGE SCAN for ORDER BY in a subquery if the range condition is taken from a superquery.

CREATE UNIQUE INDEX ux_b_start ON b (start_date);

CREATE FUNCTION `fn_get_last_b`(event_date TIMESTAMP) RETURNS int(11)
BEGIN
  DECLARE id INT;
  SELECT b.id
  INTO id
  FROM b
  FORCE INDEX (ux_b_start)
  WHERE b.start_time <= event_date
  ORDER BY
    b.start_time DESC
  LIMIT 1;
  RETURN id;
END;

SELECT COUNT(*) FROM a;

1000


SELECT COUNT(*) FROM b;

200000

SELECT *
FROM (
  SELECT fn_get_last_b(a.event_time) AS bid,
         a.*
  FROM a
) ao, b FORCE INDEX (PRIMARY)
WHERE b.id = ao.bid
  AND b.end_time >= ao.event_time

1000 rows fetched in 0,0143s (0,1279s)
南冥有猫 2024-07-20 18:40:50

我对 MySQL 没有太多经验,但在 MS SQL Server 上,在两列上添加索引允许在 1M 行表上进行索引查找和返回时间,响应时间从 1-2 秒缩短到毫秒。

您似乎看到了不同的结果。 我想知道是否有限制因素造成了影响。 我有一个检查约束来强制 start_time < 时间结束。

I don't have much experience with MySQL, but on MS SQL Server adding an index on both columns allowed for an index seek and return times on a 1M row table went from 1-2 seconds to millisecond response times.

It seems that you're seeing different results. I wonder if a constraint is making the difference. I have a check constraint to enforce that start_time < end_time.

流殇 2024-07-20 18:40:50

您基本上得到了一个具有 2 个截然不同的范围条件的查询。 您正在使用 >=,对于 MySQL 来说,这始终是范围扫描。 此处有文档用于优化范围扫描。

最重要的是,MySQL 会执行额外的检查来过滤掉满足范围条件的行,然后满足 WHERE 子句的其余部分,在您的情况下,这是另一个范围条件。

You've basically got a query with 2 distinctly separate range conditions. You are using >=, to MySQL this is always a range scan. There is documentation here to optimize range scans.

The bottom line is that MySQL performs an additional check to filter out rows that satisfy the range condition, and then satisfies the rest of the WHERE clause, which in your case is another range condition.

时光病人 2024-07-20 18:40:50

我本来想问一个关于优化事件搜索(具有开始和停止时间的项目)的类似问题,并且我已经在使用不同的方法,所以我将把它扔在那里。

基本上,如果您知道事件永远不会大于给定的持续时间,则可以搜索大于最大持续时间的有界范围,然后添加限制以消除匹配的额外内容。 因此,要获取与搜索时间相交的时间:

SELECT *
FROM events
WHERE 
   ( start_time BETWEEN ( 'search_start' - INTERVAL 2 DAY ) and 'search_end' )
   AND end_time >= 'search_start'

...您需要在 start_time 上建立索引。

(注意——我的表中有数百万个事件分布在 4 年的时间里,没有超过 24 小时的记录......我不知道这相对于空间搜索方法的表现如何,因为我必须自己去尝试一下.)

I was going to ask a similar question on optimizing searches for events (items with a start & stop time), and I'm already using a different approach, so I'll throw it out there.

Basically, if you know that your events are never larger than a given duration, you can search for a bounded range that's larger than the max duration, then add restrictions to get rid of the extra stuff that matched. So, to get times that intersect with the search time:

SELECT *
FROM events
WHERE 
   ( start_time BETWEEN ( 'search_start' - INTERVAL 2 DAY ) and 'search_end' )
   AND end_time >= 'search_start'

... you'll want to have an index on start_time.

(Note -- my table has millions of events spread over 4 years, with no record more than 24hrs ... I have no idea how this performs relative to the spatial search approach, as I'm going to have to go try that myself.)

瑾夏年华 2024-07-20 18:40:50

在一张桌子上你能做的事情并不多。 如果优化这些查询 1) 是必要的 2) 必须在 SQL 级别完成,那么您需要创建一个派生表:

Table event_times
id
event_id
mark_time

并为每个事件跨越的每个时间单位向其中添加一条记录。 然后你

SELECT *
FROM events
LEFT JOIN event_times ON event_id = events.id
WHERE mark_time = '2009-02-18 16:27:12'

可以通过如何定义“时间单位”来使这个表变得不那么荒谬,即如果你将mark_time的分辨率限制为分钟或小时而不是秒。

There's not much you can do within the one table. If optimizing these queries 1) is necessary 2) must be done at the SQL level, then you'll need to make a derived table:

Table event_times
id
event_id
mark_time

and add a record to it for every unit of time spanned by each event. Then you just

SELECT *
FROM events
LEFT JOIN event_times ON event_id = events.id
WHERE mark_time = '2009-02-18 16:27:12'

You can make this table a good bit less ridiculous by how you define 'unit of time', i.e. if you limit the resolution of mark_time to minutes or hours rather than seconds.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文