是否可以在 SQL SELECT 查询中即时临时复制和修改行？

发布于 2024-07-24 17:42:44 字数 1749 浏览 8 评论 0原文

我的应用程序刚刚收到一个新的数据源，仅在以下情况下才会将数据插入 Derby 数据库它改变。通常，丢失数据是可以的 - 我正在用数据（随时间变化的值）绘制折线图，并且我只需在两点之间画一条线，推断出任何给定点的预期值。问题是，在这种情况下，缺少数据意味着“画一条直线”，如果我这样做，图表将不正确。

有两种方法可以解决这个问题：我可以创建一个新类，以不同的方式处理丢失的数据（由于我正在使用的绘图库 prefuse 处理绘图的方式，这可能很困难），或者我可以复制行，保持 y 值相同，同时更改每行中的 x 值。我可以在连接数据库和渲染器的 Java 中执行此操作，或者我可以修改 SQL。

我的问题是，给定如下所示的结果集：

+-------+---------------------+
| value | received            |
+-------+---------------------+
|     7 | 2000-01-01 08:00:00 |
|    10 | 2000-01-01 08:00:05 |
|    11 | 2000-01-01 08:00:07 |
|     2 | 2000-01-01 08:00:13 |
|     4 | 2000-01-01 08:00:16 |
+-------+---------------------+

假设我在 8:00:20 查询它，如何使用 SQL 使它看起来像下面这样？基本上，我每秒都会复制该行，直到它被占用为止。出于所有意图和目的，received 是唯一的（事实并非如此，但这是由于查询中的 WHERE 子句造成的）。

+-------+---------------------+
| value | received            |
+-------+---------------------+
|     7 | 2000-01-01 08:00:00 |
|     7 | 2000-01-01 08:00:01 |
|     7 | 2000-01-01 08:00:02 |
|     7 | 2000-01-01 08:00:03 |
|     7 | 2000-01-01 08:00:04 |
|    10 | 2000-01-01 08:00:05 |
|    10 | 2000-01-01 08:00:06 |
|    11 | 2000-01-01 08:00:07 |
|    11 | 2000-01-01 08:00:08 |
|    11 | 2000-01-01 08:00:09 |
|    11 | 2000-01-01 08:00:10 |
|    11 | 2000-01-01 08:00:11 |
|    11 | 2000-01-01 08:00:12 |
|     2 | 2000-01-01 08:00:13 |
|     2 | 2000-01-01 08:00:14 |
|     2 | 2000-01-01 08:00:15 |
|     4 | 2000-01-01 08:00:16 |
|     4 | 2000-01-01 08:00:17 |
|     4 | 2000-01-01 08:00:18 |
|     4 | 2000-01-01 08:00:19 |
|     4 | 2000-01-01 08:00:20 |
+-------+---------------------+

感谢您的帮助。

原文

I've just received a new data source for my application which inserts data into a Derby database only when it changes. Normally, missing data is fine - I'm drawing a line chart with the data (value over time), and I'd just draw a line between the two points, extrapolating the expected value at any given point. The problem is that as missing data in this case means "draw a straight line," the graph would be incorrect if I did this.

There are two ways I could fix this: I could create a new class that handles missing data differently (which could be difficult due to the way prefuse, the drawing library I'm using, handles drawing), or I could duplicate the rows, leaving the y value the same while changing the x value in each row. I could do this in the Java that bridges the database and the renderer, or I could modify the SQL.

My question is, given a result set like the one below:

+-------+---------------------+
| value | received            |
+-------+---------------------+
|     7 | 2000-01-01 08:00:00 |
|    10 | 2000-01-01 08:00:05 |
|    11 | 2000-01-01 08:00:07 |
|     2 | 2000-01-01 08:00:13 |
|     4 | 2000-01-01 08:00:16 |
+-------+---------------------+

Assuming I query it at 8:00:20, how can I make it look like the following using SQL? Basically, I'm duplicating the row for every second until it's already taken. received is, for all intents and purposes, unique (it's not, but it will be due to the WHERE clause in the query).

+-------+---------------------+
| value | received            |
+-------+---------------------+
|     7 | 2000-01-01 08:00:00 |
|     7 | 2000-01-01 08:00:01 |
|     7 | 2000-01-01 08:00:02 |
|     7 | 2000-01-01 08:00:03 |
|     7 | 2000-01-01 08:00:04 |
|    10 | 2000-01-01 08:00:05 |
|    10 | 2000-01-01 08:00:06 |
|    11 | 2000-01-01 08:00:07 |
|    11 | 2000-01-01 08:00:08 |
|    11 | 2000-01-01 08:00:09 |
|    11 | 2000-01-01 08:00:10 |
|    11 | 2000-01-01 08:00:11 |
|    11 | 2000-01-01 08:00:12 |
|     2 | 2000-01-01 08:00:13 |
|     2 | 2000-01-01 08:00:14 |
|     2 | 2000-01-01 08:00:15 |
|     4 | 2000-01-01 08:00:16 |
|     4 | 2000-01-01 08:00:17 |
|     4 | 2000-01-01 08:00:18 |
|     4 | 2000-01-01 08:00:19 |
|     4 | 2000-01-01 08:00:20 |
+-------+---------------------+

Thanks for your help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

始终不够 2024-07-31 17:42:44

由于 SQL 基于集合的性质，没有简单的方法可以做到这一点。我使用了两种解决方案策略：

a）使用循环从初始日期时间到结束日期时间，并为每个步骤获取值，并将其插入临时表中

b）用1分钟生成一个表（正常或临时）增量，将基准日期时间添加到此表中，您可以生成步骤。

方法 b) 示例（SQL Server 版本）

假设我们永远不会查询超过 24 小时的数据。我们创建一个表intervals，其中包含一个 dttm 字段，其中包含每个步骤的分钟计数。该表必须事先填充。

select dateadd(minute,stepMinutes,'2000-01-01 08:00') received,
(select top 1 value from table where received <= 
dateadd(minute,dttm,'2000-01-01 08:00') 
order by received desc) value
from intervals

Due to the set based nature of SQL, there's no simple way to do this. I have used two solution strategies:

a) use a cycle to go from the initial to end date time and for each step get the value, and insert that into a temp table

b) generate a table (normal or temporary) with the 1 minute increments, adding the base date time to this table you can generate the steps.

Example of approach b) (SQL Server version)

Let's assume we will never query more than 24 hours of data. We create a table intervals that has a dttm field with the minute count for each step. That table must be populated previously.

select dateadd(minute,stepMinutes,'2000-01-01 08:00') received,
(select top 1 value from table where received <= 
dateadd(minute,dttm,'2000-01-01 08:00') 
order by received desc) value
from intervals

回复收藏 0 原文

Smile简单爱 2024-07-31 17:42:44

在这种情况下，您似乎确实不需要生成所有这些数据点。生成以下内容是否正确？如果它画一条直线，则不需要每秒生成一个数据点，只需为每个数据点生成两个数据点......一个在当前时间，一个在下一次之前。此示例从下一次减去 5 毫秒，但如果需要，您可以将其设置为一整秒。

+-------+---------------------+
| value | received            |
+-------+---------------------+
|     7 | 2000-01-01 08:00:00 |
|     7 | 2000-01-01 08:00:04 |
|    10 | 2000-01-01 08:00:05 |
|    10 | 2000-01-01 08:00:06 |
|    11 | 2000-01-01 08:00:07 |
|    11 | 2000-01-01 08:00:12 |
|     2 | 2000-01-01 08:00:13 |
|     2 | 2000-01-01 08:00:15 |
|     4 | 2000-01-01 08:00:16 |
|     4 | 2000-01-01 08:00:20D |
+-------+---------------------+

如果是这种情况，那么您可以执行以下操作：

SELECT * FROM
(SELECT * from TimeTable as t1
UNION
SELECT t2.value, dateadd(ms, -5, t2.received)
from ( Select t3.value, (select top 1 t4.received  
                         from TimeTable t4 
                         where t4.received > t3.received
                         order by t4.received asc) as received
from TimeTable t3) as t2
UNION
SELECT top 1 t6.value, GETDATE()
from TimeTable t6
order by t6.received desc
) as t5
where received IS NOT NULL
order by t5.received

这样做的最大优点是它是基于集合的解决方案，并且比任何迭代方法都要快得多。

It seems like in this case you really don't need to generate all of these datapoints. Would it be correct to generate the following instead? If it's drawing a straight line, you don't need go generate a data point for each second, just two for each datapoint...one at the current time, one right before the next time. This example subtracts 5 ms from the next time, but you could make it a full second if you need it.

+-------+---------------------+
| value | received            |
+-------+---------------------+
|     7 | 2000-01-01 08:00:00 |
|     7 | 2000-01-01 08:00:04 |
|    10 | 2000-01-01 08:00:05 |
|    10 | 2000-01-01 08:00:06 |
|    11 | 2000-01-01 08:00:07 |
|    11 | 2000-01-01 08:00:12 |
|     2 | 2000-01-01 08:00:13 |
|     2 | 2000-01-01 08:00:15 |
|     4 | 2000-01-01 08:00:16 |
|     4 | 2000-01-01 08:00:20D |
+-------+---------------------+

If that's the case, then you can do the following:

SELECT * FROM
(SELECT * from TimeTable as t1
UNION
SELECT t2.value, dateadd(ms, -5, t2.received)
from ( Select t3.value, (select top 1 t4.received  
                         from TimeTable t4 
                         where t4.received > t3.received
                         order by t4.received asc) as received
from TimeTable t3) as t2
UNION
SELECT top 1 t6.value, GETDATE()
from TimeTable t6
order by t6.received desc
) as t5
where received IS NOT NULL
order by t5.received

The big advantage of this is that it is a set based solution and will be much faster than any iterative approach.

回复收藏 0 原文

友谊不毕业 2024-07-31 17:42:44

您可以只移动光标，保留最后一个值的变量& 返回时间，如果当前时间领先一秒以上，则使用前一个值和新时间一次循环一秒，直到获得当前行的时间。

尝试在 SQL 中执行此操作会很痛苦，并且如果您创建了丢失的数据，则可能必须添加一列来跟踪实际/插值数据点。

回复收藏 0 原文

故事↓在人 2024-07-31 17:42:44

更好的办法是为图表上想要的每个轴值建立一个表，然后加入其中，或者甚至只是将数据字段放在那里，并在值到达时更新该记录。

“缺失值”问题相当广泛，因此我建议您制定可靠的政策。

将会发生的一件事是，您将有多个带有缺失值的相邻槽。

如果您可以将其转换为 OLAP 数据，这会容易得多。

回复收藏 0 原文

错爱 2024-07-31 17:42:44

创建一个包含所有分钟的简单表（警告，将运行一段时间）：

Create Table Minutes(Value DateTime Not Null)
Go

Declare @D DateTime
Set @D = '1/1/2000'

While (Year(@D) < 2002)
Begin
  Insert Into Minutes(Value) Values(@D)
  Set @D = DateAdd(Minute, 1, @D)
End
Go


Create Clustered Index IX_Minutes On Minutes(Value)
Go

然后您可以像这样使用它：

Select 
  Received = Minutes.Value,
  Value = (Select Top 1 Data.Value
           From Data
           Where Data.Received <= Minutes.Received
           Order By Data.Received Desc)
From
  Minutes
Where
  Minutes.Value Between @Start And @End

Create a simple table that has all the minutes (warning, will run for a while):

Create Table Minutes(Value DateTime Not Null)
Go

Declare @D DateTime
Set @D = '1/1/2000'

While (Year(@D) < 2002)
Begin
  Insert Into Minutes(Value) Values(@D)
  Set @D = DateAdd(Minute, 1, @D)
End
Go


Create Clustered Index IX_Minutes On Minutes(Value)
Go

You can then use it somewhat like this:

Select 
  Received = Minutes.Value,
  Value = (Select Top 1 Data.Value
           From Data
           Where Data.Received <= Minutes.Received
           Order By Data.Received Desc)
From
  Minutes
Where
  Minutes.Value Between @Start And @End

回复收藏 0 原文

堇色安年 2024-07-31 17:42:44

我建议不要在 SQL/数据库中解决这个问题，因为它基于集合的性质。
另外，您在这里处理的是秒数，所以我想您最终可能会得到很多行，具有相同的重复数据，必须将其从数据库传输到您的应用程序。

回复收藏 0 原文

女皇必胜 2024-07-31 17:42:44

处理此问题的一种方法是将数据左连接到包含所有接收值的表。然后，当该行没有值时，您可以根据您拥有的上一个和下一个实际值计算预计值应为多少。

你没有说你使用什么数据库平台。在 SQL Server 中，我将创建一个接受开始日期时间和结束日期时间值的用户定义函数。它将返回一个表值，其中包含您需要的所有收到值。

我在下面模拟了它，它在 SQL Server 中运行。别名为 r 的子选择是用户定义函数实际返回的内容。

select r.received,
isnull(d.value,(select top 1 data.value from data where data.received < r.received order by data.received desc)) as x
from (
    select cast('2000-01-01 08:00:00' as datetime) received
    union all
    select cast('2000-01-01 08:00:01' as datetime)
    union all
    select cast('2000-01-01 08:00:02' as datetime)
    union all
    select cast('2000-01-01 08:00:03' as datetime)
    union all
    select cast('2000-01-01 08:00:04' as datetime)
    union all
    select cast('2000-01-01 08:00:05' as datetime)
    union all
    select cast('2000-01-01 08:00:06' as datetime)
    union all
    select cast('2000-01-01 08:00:07' as datetime)
    union all
    select cast('2000-01-01 08:00:08' as datetime)
    union all
    select cast('2000-01-01 08:00:09' as datetime)
    union all
    select cast('2000-01-01 08:00:10' as datetime)
    union all
    select cast('2000-01-01 08:00:11' as datetime)
    union all
    select cast('2000-01-01 08:00:12' as datetime)
    union all
    select cast('2000-01-01 08:00:13' as datetime)
    union all
    select cast('2000-01-01 08:00:14' as datetime)
    union all
    select cast('2000-01-01 08:00:15' as datetime)
    union all
    select cast('2000-01-01 08:00:16' as datetime)
    union all
    select cast('2000-01-01 08:00:17' as datetime)
    union all
    select cast('2000-01-01 08:00:18' as datetime)
    union all
    select cast('2000-01-01 08:00:19' as datetime)
    union all
    select cast('2000-01-01 08:00:20' as datetime)
) r
left outer join Data d on r.received = d.received

One way to handle this is to left join your data against a table that contains all of the received values. Then, when there is no value for that row, you calculate what the projected value should be based on the previous and next actual values you have.

You didn't say what database platform you are using. In SQL Server, I would create a User Defined Function that accepts a start datetime and end datetime value. It would return a table value with all of the received values you need.

I have simulated it below, which runs in SQL Server. The subselect aliased r is what would actually get returned by the user defined function.

select r.received,
isnull(d.value,(select top 1 data.value from data where data.received < r.received order by data.received desc)) as x
from (
    select cast('2000-01-01 08:00:00' as datetime) received
    union all
    select cast('2000-01-01 08:00:01' as datetime)
    union all
    select cast('2000-01-01 08:00:02' as datetime)
    union all
    select cast('2000-01-01 08:00:03' as datetime)
    union all
    select cast('2000-01-01 08:00:04' as datetime)
    union all
    select cast('2000-01-01 08:00:05' as datetime)
    union all
    select cast('2000-01-01 08:00:06' as datetime)
    union all
    select cast('2000-01-01 08:00:07' as datetime)
    union all
    select cast('2000-01-01 08:00:08' as datetime)
    union all
    select cast('2000-01-01 08:00:09' as datetime)
    union all
    select cast('2000-01-01 08:00:10' as datetime)
    union all
    select cast('2000-01-01 08:00:11' as datetime)
    union all
    select cast('2000-01-01 08:00:12' as datetime)
    union all
    select cast('2000-01-01 08:00:13' as datetime)
    union all
    select cast('2000-01-01 08:00:14' as datetime)
    union all
    select cast('2000-01-01 08:00:15' as datetime)
    union all
    select cast('2000-01-01 08:00:16' as datetime)
    union all
    select cast('2000-01-01 08:00:17' as datetime)
    union all
    select cast('2000-01-01 08:00:18' as datetime)
    union all
    select cast('2000-01-01 08:00:19' as datetime)
    union all
    select cast('2000-01-01 08:00:20' as datetime)
) r
left outer join Data d on r.received = d.received

回复收藏 0 原文

非要怀念 2024-07-31 17:42:44

如果您使用 SQL Server，那么这将是一个好的开始。我不确定 Apache 的 Derby 与 sql 有多接近。

Usage: EXEC ElaboratedData '2000-01-01 08:00:00','2000-01-01 08:00:20'

CREATE PROCEDURE [dbo].[ElaboratedData]
  @StartDate DATETIME,
  @EndDate DATETIME
AS
  --if not a valid interval, just quit
  IF @EndDate<=@StartDate BEGIN
    SELECT 0;    
    RETURN;
  END;

  /*
  Store the value of 1 second locally, for readability
  --*/
  DECLARE @OneSecond FLOAT;
  SET @OneSecond = (1.00000000/86400.00000000);

  /*
  create a temp table w/the same structure as the real table.
  --*/
  CREATE TABLE #SecondIntervals(TSTAMP DATETIME, DATAPT INT);

  /*
  For each second in the interval, check to see if we have a known value.
  If we do, then use that.  If not, make one up.
  --*/ 
  DECLARE @CurrentSecond DATETIME; 
  SET @CurrentSecond = @StartDate;
  WHILE @CurrentSecond <= @EndDate BEGIN
    DECLARE @KnownValue INT;

    SELECT @KnownValue=DATAPT
    FROM TESTME
    WHERE TSTAMP = @CurrentSecond;

    IF (0 = ISNULL(@KnownValue,0)) BEGIN
      --ok, we have to make up a fake value
      DECLARE @MadeUpValue INT;
      /*
      *******Put whatever logic you want to make up a fake value here
      --*/
      SET @MadeUpValue = 99;

      INSERT INTO #SecondIntervals(
        TSTAMP
       ,DATAPT
      )
      VALUES(
        @CurrentSecond
       ,@MadeUpValue
      );
    END;  --if we had to make up a value
    SET @CurrentSecond = @CurrentSecond + @OneSecond;
  END;  --while looking thru our values

  --finally, return our generated values + real values
  SELECT TSTAMP, DATAPT FROM #SecondIntervals
  UNION ALL
  SELECT TSTAMP, DATAPT FROM TESTME
  ORDER BY TSTAMP;
GO

If you were in SQL Server, then this would be a good start. I am not sure how close Apache's Derby is to sql.

Usage: EXEC ElaboratedData '2000-01-01 08:00:00','2000-01-01 08:00:20'

CREATE PROCEDURE [dbo].[ElaboratedData]
  @StartDate DATETIME,
  @EndDate DATETIME
AS
  --if not a valid interval, just quit
  IF @EndDate<=@StartDate BEGIN
    SELECT 0;    
    RETURN;
  END;

  /*
  Store the value of 1 second locally, for readability
  --*/
  DECLARE @OneSecond FLOAT;
  SET @OneSecond = (1.00000000/86400.00000000);

  /*
  create a temp table w/the same structure as the real table.
  --*/
  CREATE TABLE #SecondIntervals(TSTAMP DATETIME, DATAPT INT);

  /*
  For each second in the interval, check to see if we have a known value.
  If we do, then use that.  If not, make one up.
  --*/ 
  DECLARE @CurrentSecond DATETIME; 
  SET @CurrentSecond = @StartDate;
  WHILE @CurrentSecond <= @EndDate BEGIN
    DECLARE @KnownValue INT;

    SELECT @KnownValue=DATAPT
    FROM TESTME
    WHERE TSTAMP = @CurrentSecond;

    IF (0 = ISNULL(@KnownValue,0)) BEGIN
      --ok, we have to make up a fake value
      DECLARE @MadeUpValue INT;
      /*
      *******Put whatever logic you want to make up a fake value here
      --*/
      SET @MadeUpValue = 99;

      INSERT INTO #SecondIntervals(
        TSTAMP
       ,DATAPT
      )
      VALUES(
        @CurrentSecond
       ,@MadeUpValue
      );
    END;  --if we had to make up a value
    SET @CurrentSecond = @CurrentSecond + @OneSecond;
  END;  --while looking thru our values

  --finally, return our generated values + real values
  SELECT TSTAMP, DATAPT FROM #SecondIntervals
  UNION ALL
  SELECT TSTAMP, DATAPT FROM TESTME
  ORDER BY TSTAMP;
GO

回复收藏 0 原文

沉鱼一梦 2024-07-31 17:42:44

作为一个想法，您可能需要查看 Anthony Mollinaro 的 SQL Cookbook，第 9 章。他有一个食谱，“填写缺失的日期”（查看第 278-281 页），主要讨论您想要做什么。它需要某种顺序处理，要么通过辅助表，要么递归地执行查询。虽然他没有直接提供 Derby 的示例，但我怀疑您可能可以根据您的问题调整它们（特别是 PostgreSQL 或 MySQL 的示例，这似乎与平台无关）。

回复收藏 0 原文

~没有更多了~