在 sql 结果中填充空日期的最直接方法是什么(在 mysql 或 perl 端)?

发布于 2024-07-04 04:00:13 字数 862 浏览 5 评论 0原文

我正在从 mysql 表构建一个快速 csv,查询如下:

select DATE(date),count(date) from table group by DATE(date) order by date asc;

然后将它们转储到 perl 中的文件中:

while(my($date,$sum) = $sth->fetchrow) {
    print CSV "$date,$sum\n"
}

数据中存在日期间隙,不过:

| 2008-08-05 |           4 | 
| 2008-08-07 |          23 | 

我想填充数据以填补缺失的部分零计数条目的天数最终为:

| 2008-08-05 |           4 | 
| 2008-08-06 |           0 | 
| 2008-08-07 |          23 | 

我用每月的天数数组和一些数学组合了一个非常尴尬(而且几乎肯定有错误)的解决方法,但是在 mysql 或Perl 方面。

有什么天才的想法/耳光可以解释为什么我这么愚蠢吗?


我最终使用了一个存储过程,该过程为所讨论的日期范围生成了一个临时表,原因如下:

  • 我知道我每次都会查找的日期范围
  • 不幸的是,所讨论的服务器不是我可以安装的服务器atm 上的 perl 模块,并且它的状态已经足够破旧,以至于它没有远程安装任何东西 Date::-y

perl Date/DateTime 迭代答案也非常好,我希望我可以选择多个答案!

I'm building a quick csv from a mysql table with a query like:

select DATE(date),count(date) from table group by DATE(date) order by date asc;

and just dumping them to a file in perl over a:

while(my($date,$sum) = $sth->fetchrow) {
    print CSV "$date,$sum\n"
}

There are date gaps in the data, though:

| 2008-08-05 |           4 | 
| 2008-08-07 |          23 | 

I would like to pad the data to fill in the missing days with zero-count entries to end up with:

| 2008-08-05 |           4 | 
| 2008-08-06 |           0 | 
| 2008-08-07 |          23 | 

I slapped together a really awkward (and almost certainly buggy) workaround with an array of days-per-month and some math, but there has to be something more straightforward either on the mysql or perl side.

Any genius ideas/slaps in the face for why me am being so dumb?


I ended up going with a stored procedure which generated a temp table for the date range in question for a couple of reasons:

  • I know the date range I'll be looking for every time
  • The server in question unfortunately was not one that I can install perl modules on atm, and the state of it was decrepit enough that it didn't have anything remotely Date::-y installed

The perl Date/DateTime-iterating answers were also very good, I wish I could select multiple answers!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

如若梦似彩虹 2024-07-11 04:00:14

使用一些Perl模块进行日期计算,例如推荐的DateTime或Time::Piece(5.10的核心)。 只需递增日期并打印日期和 0,直到日期与当前日期匹配。

Use some Perl module to do date calculations, like recommended DateTime or Time::Piece (core from 5.10). Just increment date and print date and 0 until date will match current.

习ぎ惯性依靠 2024-07-11 04:00:14

我不知道这是否可行,但是如果您创建一个包含所有可能日期的新表(这可能是这个想法的问题,如果日期范围将发生不可预测的变化......)并且然后对两个表进行左连接? 我想如果有大量可能的日期,或者无法预测第一个和最后一个日期,那么这是一个疯狂的解决方案,但如果日期范围是固定的或易于算出,那么这可能会起作用。

I don't know if this would work, but how about if you created a new table which contained all the possible dates (that might be the problem with this idea, if the range of dates is going to change unpredictably...) and then do a left join on the two tables? I guess it's a crazy solution if there are a vast number of possible dates, or no way to predict the first and last date, but if the range of dates is either fixed or easy to work out, then this might work.

泪痕残 2024-07-11 04:00:13

我认为解决该问题的最简单的通用解决方案是创建一个具有所需行数最多的 Ordinal 表(在您的情况下为 31*3 = 93)。

CREATE TABLE IF NOT EXISTS `Ordinal` (
  `n` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`n`)
);
INSERT INTO `Ordinal` (`n`)
VALUES (NULL), (NULL), (NULL); #etc

接下来,从 Ordinal 到您的数据执行 LEFT JOIN。 这是一个简单的例子,获取上周的每一天:

SELECT CURDATE() - INTERVAL `n` DAY AS `day`
FROM `Ordinal` WHERE `n` <= 7
ORDER BY `n` ASC

您需要更改的两件事是起点和间隔。 为了清楚起见,我使用了 SET @var = 'value' 语法。

SET @end = CURDATE() - INTERVAL DAY(CURDATE()) DAY;
SET @begin = @end - INTERVAL 3 MONTH;
SET @period = DATEDIFF(@end, @begin);

SELECT @begin + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal` WHERE `n` < @period
ORDER BY `n` ASC;

因此,如果您加入以获取过去三个月内每天的消息数,那么最终的代码将如下所示:

SELECT COUNT(`msg`.`id`) AS `message_count`, `ord`.`date` FROM (
    SELECT ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH) + INTERVAL (`n` + 1) DAY AS `date`
    FROM `Ordinal`
    WHERE `n` < (DATEDIFF((CURDATE() - INTERVAL DAY(CURDATE()) DAY), ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH)))
    ORDER BY `n` ASC
) AS `ord`
LEFT JOIN `Message` AS `msg`
  ON `ord`.`date` = `msg`.`date`
GROUP BY `ord`.`date`

提示和评论:

  • 查询中最困难的部分可能是确定限制 < 时要使用的天数代码>序数。 相比之下,将该整数序列转换为日期很容易。
  • 您可以使用 Ordinal 来满足所有不间断的序列需求。 只需确保它包含的行数多于最长的序列即可。
  • 您可以对 Ordinal 使用多个查询来获取多个序列,例如列出过去七 (1-7) 周的每个工作日 (1-5)。
  • 您可以通过将日期存储在 Ordinal 表中来加快速度,但灵活性会较差。 这样,无论您使用多少次,您都只需要一个 Ordinal 表。 不过,如果速度值得,请尝试 INSERT INTO ... SELECT 语法。

I think the simplest general solution to the problem would be to create an Ordinal table with the highest number of rows that you need (in your case 31*3 = 93).

CREATE TABLE IF NOT EXISTS `Ordinal` (
  `n` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`n`)
);
INSERT INTO `Ordinal` (`n`)
VALUES (NULL), (NULL), (NULL); #etc

Next, do a LEFT JOIN from Ordinal onto your data. Here's a simple case, getting every day in the last week:

SELECT CURDATE() - INTERVAL `n` DAY AS `day`
FROM `Ordinal` WHERE `n` <= 7
ORDER BY `n` ASC

The two things you would need to change about this are the starting point and the interval. I have used SET @var = 'value' syntax for clarity.

SET @end = CURDATE() - INTERVAL DAY(CURDATE()) DAY;
SET @begin = @end - INTERVAL 3 MONTH;
SET @period = DATEDIFF(@end, @begin);

SELECT @begin + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal` WHERE `n` < @period
ORDER BY `n` ASC;

So the final code would look something like this, if you were joining to get the number of messages per day over the last three months:

SELECT COUNT(`msg`.`id`) AS `message_count`, `ord`.`date` FROM (
    SELECT ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH) + INTERVAL (`n` + 1) DAY AS `date`
    FROM `Ordinal`
    WHERE `n` < (DATEDIFF((CURDATE() - INTERVAL DAY(CURDATE()) DAY), ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH)))
    ORDER BY `n` ASC
) AS `ord`
LEFT JOIN `Message` AS `msg`
  ON `ord`.`date` = `msg`.`date`
GROUP BY `ord`.`date`

Tips and Comments:

  • Probably the hardest part of your query was determining the number of days to use when limiting Ordinal. By comparison, transforming that integer sequence into dates was easy.
  • You can use Ordinal for all of your uninterrupted-sequence needs. Just make sure it contains more rows than your longest sequence.
  • You can use multiple queries on Ordinal for multiple sequences, for example listing every weekday (1-5) for the past seven (1-7) weeks.
  • You could make it faster by storing dates in your Ordinal table, but it would be less flexible. This way you only need one Ordinal table, no matter how many times you use it. Still, if the speed is worth it, try the INSERT INTO ... SELECT syntax.
诠释孤独 2024-07-11 04:00:13

由于您不知道差距在哪里,但您想要从列表中的第一个日期到最后一个日期的所有值(大概),请执行以下操作:

use DateTime;
use DateTime::Format::Strptime;
my @row = $sth->fetchrow;
my $countdate = strptime("%Y-%m-%d", $firstrow[0]);
my $thisdate = strptime("%Y-%m-%d", $firstrow[0]);

while ($countdate) {
  # keep looping countdate until it hits the next db row date
  if(DateTime->compare($countdate, $thisdate) == -1) {
    # counter not reached next date yet
    print CSV $countdate->ymd . ",0\n";
    $countdate = $countdate->add( days => 1 );
    $next;
  }

  # countdate is equal to next row's date, so print that instead
  print CSV $thisdate->ymd . ",$row[1]\n";

  # increase both
  @row = $sth->fetchrow;
  $thisdate = strptime("%Y-%m-%d", $firstrow[0]);
  $countdate = $countdate->add( days => 1 );
}

嗯,结果比我想象的要复杂会是..我希望这是有道理的!

Since you don't know where the gaps are, and yet you want all the values (presumably) from the first date in your list to the last one, do something like:

use DateTime;
use DateTime::Format::Strptime;
my @row = $sth->fetchrow;
my $countdate = strptime("%Y-%m-%d", $firstrow[0]);
my $thisdate = strptime("%Y-%m-%d", $firstrow[0]);

while ($countdate) {
  # keep looping countdate until it hits the next db row date
  if(DateTime->compare($countdate, $thisdate) == -1) {
    # counter not reached next date yet
    print CSV $countdate->ymd . ",0\n";
    $countdate = $countdate->add( days => 1 );
    $next;
  }

  # countdate is equal to next row's date, so print that instead
  print CSV $thisdate->ymd . ",$row[1]\n";

  # increase both
  @row = $sth->fetchrow;
  $thisdate = strptime("%Y-%m-%d", $firstrow[0]);
  $countdate = $countdate->add( days => 1 );
}

Hmm, that turned out to be more complicated than I thought it would be.. I hope it makes sense!

十六岁半 2024-07-11 04:00:13

当您在服务器端需要类似的东西时,通常会创建一个表,其中包含两个时间点之间所有可能的日期,然后将该表与查询结果左连接。 像这样的事情:

create procedure sp1(d1 date, d2 date)
  declare d datetime;

  create temporary table foo (d date not null);

  set d = d1
  while d <= d2 do
    insert into foo (d) values (d)
    set d = date_add(d, interval 1 day)
  end while

  select foo.d, count(date)
  from foo left join table on foo.d = table.date
  group by foo.d order by foo.d asc;

  drop temporary table foo;
end procedure

在这种特殊情况下,最好在客户端进行一些检查,如果当前日期不是 previos+1,则添加一些附加字符串。

When you need something like that on server side, you usually create a table which contains all possible dates between two points in time, and then left join this table with query results. Something like this:

create procedure sp1(d1 date, d2 date)
  declare d datetime;

  create temporary table foo (d date not null);

  set d = d1
  while d <= d2 do
    insert into foo (d) values (d)
    set d = date_add(d, interval 1 day)
  end while

  select foo.d, count(date)
  from foo left join table on foo.d = table.date
  group by foo.d order by foo.d asc;

  drop temporary table foo;
end procedure

In this particular case it would be better to put a little check on the client side, if current date is not previos+1, put some addition strings.

莫言歌 2024-07-11 04:00:13

当我必须处理这个问题时,为了填写缺失的日期,我实际上创建了一个参考表,其中仅包含我感兴趣的所有日期,并在日期字段上加入了数据表。 这很粗糙,但很有效。

SELECT DATE(r.date),count(d.date) 
FROM dates AS r 
LEFT JOIN table AS d ON d.date = r.date 
GROUP BY DATE(r.date) 
ORDER BY r.date ASC;

至于输出,我只使用 SELECT INTO OUTFILE 代替手动生成 CSV。 让我们也不必担心转义特殊字符。

When I had to deal with this problem, to fill in missing dates I actually created a reference table that just contained all dates I'm interested in and joined the data table on the date field. It's crude, but it works.

SELECT DATE(r.date),count(d.date) 
FROM dates AS r 
LEFT JOIN table AS d ON d.date = r.date 
GROUP BY DATE(r.date) 
ORDER BY r.date ASC;

As for output, I'd just use SELECT INTO OUTFILE instead of generating the CSV by hand. Leaves us free from worrying about escaping special characters as well.

眼眸里的那抹悲凉 2024-07-11 04:00:13

不傻,这不是 MySQL 所做的事情,插入空日期值。 我在 Perl 中通过两步过程完成此操作。 首先,将查询中的所有数据加载到按日期组织的哈希中。 然后,我创建一个 Date::EzDate 对象并按天递增它,所以...

my $current_date = Date::EzDate->new();
$current_date->{'default'} = '{YEAR}-{MONTH NUMBER BASE 1}-{DAY OF MONTH}';
while ($current_date <= $final_date)
{
    print "$current_date\t|\t%hash_o_data{$current_date}";  # EzDate provides for     automatic stringification in the format specfied in 'default'
    $current_date++;
}

其中最终日期是另一个 EzDate 对象或包含日期范围末尾的字符串。

EzDate 目前不在 CPAN 上,但您可能可以找到另一个 perl mod,它将进行日期比较并提供日期增量器。

not dumb, this isn't something that MySQL does, inserting the empty date values. I do this in perl with a two-step process. First, load all of the data from the query into a hash organised by date. Then, I create a Date::EzDate object and increment it by day, so...

my $current_date = Date::EzDate->new();
$current_date->{'default'} = '{YEAR}-{MONTH NUMBER BASE 1}-{DAY OF MONTH}';
while ($current_date <= $final_date)
{
    print "$current_date\t|\t%hash_o_data{$current_date}";  # EzDate provides for     automatic stringification in the format specfied in 'default'
    $current_date++;
}

where final date is another EzDate object or a string containing the end of your date range.

EzDate isn't on CPAN right now, but you can probably find another perl mod that will do date compares and provide a date incrementor.

痴骨ら 2024-07-11 04:00:13

您可以使用 DateTime 对象:

use DateTime;
my $dt;

while ( my ($date, $sum) = $sth->fetchrow )  {
    if (defined $dt) {
        print CSV $dt->ymd . ",0\n" while $dt->add(days => 1)->ymd lt $date;
    }
    else {
        my ($y, $m, $d) = split /-/, $date;
        $dt = DateTime->new(year => $y, month => $m, day => $d);
    }
    print CSV, "$date,$sum\n";
}

上面的代码的作用是保留存储的最后打印日期在一个
DateTime 对象 $dt,以及当当前日期超过一天时
将来,它会将 $dt 增加一天(并打印一行以
CSV),直到与当前日期相同。

这样你就不需要额外的表,也不需要获取所有的
提前行。

You could use a DateTime object:

use DateTime;
my $dt;

while ( my ($date, $sum) = $sth->fetchrow )  {
    if (defined $dt) {
        print CSV $dt->ymd . ",0\n" while $dt->add(days => 1)->ymd lt $date;
    }
    else {
        my ($y, $m, $d) = split /-/, $date;
        $dt = DateTime->new(year => $y, month => $m, day => $d);
    }
    print CSV, "$date,$sum\n";
}

What the above code does is it keeps the last printed date stored in a
DateTime object $dt, and when the current date is more than one day
in the future, it increments $dt by one day (and prints it a line to
CSV) until it is the same as the current date.

This way you don't need extra tables, and don't need to fetch all your
rows in advance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文