PHP 中的日期匹配和插值

发布于 2024-11-28 01:54:34 字数 1540 浏览 1 评论 0原文

序言：我有一个大型数据集，其中存储有数十万条记录，存储在 MySQL 数据库中。大大简化后，每一行都有一个日期时间字段来存储拨打电话的日期和时间，以及一个整数字段来存储通话时长。

场景：我正忙于在 PHP 中编写一个插值函数，该函数生成一系列由预先计算的间隔分隔的日期。每个生成的日期都存储在一个关联数组中，并以日期为键，每个值都初始化为 0。然后，该脚本在数据库中查询记录列表，并尝试将日期时间记录与预置中最接近的日期进行匹配。生成的关联数组。当找到最接近的匹配时，它只是将调用持续时间添加到该索引处的数组的现有值中。

生成关联数组的示例：

$array    =   array(  "2011-01-01 09:00:00" => 0,
                      "2011-01-01 09:30:00" => 0,
                      "2011-01-01 10:00:00" => 0,
                      "2011-01-01 10:30:00" => 0,
                      "2011-01-01 11:00:00" => 0,
                      "2011-01-01 11:30:00" => 0,
                      "2011-01-01 12:00:00" => 0
                   )

在上面的示例中，使用 30 分钟的间隔生成日期范围。

来自 MySQL 数据库的记录示例：

+---------------------+----------+
| datetime            | duration |
+---------------------+----------+
| 2011-01-01 09:02:26 |        1 |
| 2011-01-01 09:14:51 |        1 |
| 2011-01-01 10:40:33 |      549 |
| 2011-01-01 11:10:27 |       38 |
| 2011-01-01 11:31:50 |       82 |
+---------------------+----------+

现在，每条记录都需要与上面给出的预生成数组中最接近的日期时间键相匹配，并将 duration 值添加到匹配的现有值。

问题： 构造两个嵌套的 for 循环来交互数据库中的记录，然后线性地运行关联数组来查找匹配项是很容易的，但是这是非常低效的，并且对于大型数据集来说会出现问题（想想冒泡排序，这大致相当于）。稍微好一点的方法是线性循环数据库中的记录，然后将数组作为二叉树进行迭代，这当然更有效，并且是可能的，因为两个数组都是按时间顺序排序的。

问题： 有没有比我在上述问题中描述的方式更有效的处理此日期匹配的方法？

原文

Prelude: I have a large dataset with several hundred thousand records stored in a MySQL database. Greatly simplified, each row has a datetime field to store the date and time at which a telephone call was made and an integer field to store the length of the call.

Scenario: I'm busy writing an interpolation function in PHP which generates a range of dates separated by a pre-calculated interval. Each generated date is stored in an associative array with the date used as the key, and each value initialized to 0. The script then queries the database for a list of records and attempts to match the datetime record to the closest date in the pre-generated associative array. When the closest match is found, it simply adds the call duration to the existing value of the array at that index.

Example of the generated associative array:

$array    =   array(  "2011-01-01 09:00:00" => 0,
                      "2011-01-01 09:30:00" => 0,
                      "2011-01-01 10:00:00" => 0,
                      "2011-01-01 10:30:00" => 0,
                      "2011-01-01 11:00:00" => 0,
                      "2011-01-01 11:30:00" => 0,
                      "2011-01-01 12:00:00" => 0
                   )

In the above example, a range of dates is generated using an interval of 30 minutes.

Example of records from the MySQL database:

+---------------------+----------+
| datetime            | duration |
+---------------------+----------+
| 2011-01-01 09:02:26 |        1 |
| 2011-01-01 09:14:51 |        1 |
| 2011-01-01 10:40:33 |      549 |
| 2011-01-01 11:10:27 |       38 |
| 2011-01-01 11:31:50 |       82 |
+---------------------+----------+

Each of these records now needs to be matched to the closest datetime key from the pre-generated array given above and the duration value added to the existing value of the match.

The Problem:
It's easy enough constructing two nested for loops to interate through the records from the database and then linearly run through the associative array to find a match, but this is hugely inefficient, and becomes problematic for large datasets (think bubblesort, that's what this would be roughly equivalent to). A slightly better approach is to linearly loop over the records from the database, and then iterate over the array as a binary tree, which is certainly a lot more efficient, and is possible since both arrays are sorted chronologically.

The Question:
Is there a more efficient way of handling this date matching than how I've described in the above problem?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一抹淡然 2024-12-05 01:54:34

将日期的 UNIX_TIMESTAMP() 除以 30 * 60 秒（30 分钟）并使用 ROUND() 作为整数怎么样？然后，使用 GROUP BY 对它们进行分组，最后使用 SUM() 对持续时间求和。

SELECT SUM(duration), ROUND(UNIX_TIMESTAMP(datetime) / (30 * 60)) FROM table GROUP BY ROUND(UNIX_TIMESTAMP(datetime) / (30 * 60))

What about dividing the UNIX_TIMESTAMP() of the date by 30 * 60 seconds (30 minutes) and using ROUND() for the integer. Then, use GROUP BY to group them, and finally, SUM() to sum the durations.

SELECT SUM(duration), ROUND(UNIX_TIMESTAMP(datetime) / (30 * 60)) FROM table GROUP BY ROUND(UNIX_TIMESTAMP(datetime) / (30 * 60))

回复收藏 0 原文

天生の放荡 2024-12-05 01:54:34

您当前的算法（如果我理解正确的话）如下所示：

获取所有记录
将每条记录的日期时间与数组键进行比较
根据记录增加适当数组值的总和

更有效的方法可能是：

查询与给定时间子集匹配的所有记录的持续时间
使用适当的键将总和插入到数组中

这允许 MySQL 提供优化的数学逻辑并减少脚本所需的数组迭代次数。您将增加数据库查询的数量，但基准测试会告诉您这种权衡是否值得。

回复收藏 0 原文

~没有更多了~

关于作者

归属感

暂无简介

0 文章

0 评论

20978 人气

关注发私信

友情链接

文江博客

PHP 中的日期匹配和插值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

PHP 中的日期匹配和插值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。