是否可以在 SQL SELECT 查询中即时临时复制和修改行?
我的应用程序刚刚收到一个新的数据源,仅在以下情况下才会将数据插入 Derby 数据库它改变。 通常,丢失数据是可以的 - 我正在用数据(随时间变化的值)绘制折线图,并且我只需在两点之间画一条线,推断出任何给定点的预期值。 问题是,在这种情况下,缺少数据意味着“画一条直线”,如果我这样做,图表将不正确。
有两种方法可以解决这个问题:我可以创建一个新类,以不同的方式处理丢失的数据(由于我正在使用的绘图库 prefuse 处理绘图的方式,这可能很困难),或者我可以复制行,保持 y 值相同,同时更改每行中的 x 值。 我可以在连接数据库和渲染器的 Java 中执行此操作,或者我可以修改 SQL。
我的问题是,给定如下所示的结果集:
+-------+---------------------+
| value | received |
+-------+---------------------+
| 7 | 2000-01-01 08:00:00 |
| 10 | 2000-01-01 08:00:05 |
| 11 | 2000-01-01 08:00:07 |
| 2 | 2000-01-01 08:00:13 |
| 4 | 2000-01-01 08:00:16 |
+-------+---------------------+
假设我在 8:00:20 查询它,如何使用 SQL 使它看起来像下面这样? 基本上,我每秒都会复制该行,直到它被占用为止。 出于所有意图和目的,received
是唯一的(事实并非如此,但这是由于查询中的 WHERE
子句造成的)。
+-------+---------------------+
| value | received |
+-------+---------------------+
| 7 | 2000-01-01 08:00:00 |
| 7 | 2000-01-01 08:00:01 |
| 7 | 2000-01-01 08:00:02 |
| 7 | 2000-01-01 08:00:03 |
| 7 | 2000-01-01 08:00:04 |
| 10 | 2000-01-01 08:00:05 |
| 10 | 2000-01-01 08:00:06 |
| 11 | 2000-01-01 08:00:07 |
| 11 | 2000-01-01 08:00:08 |
| 11 | 2000-01-01 08:00:09 |
| 11 | 2000-01-01 08:00:10 |
| 11 | 2000-01-01 08:00:11 |
| 11 | 2000-01-01 08:00:12 |
| 2 | 2000-01-01 08:00:13 |
| 2 | 2000-01-01 08:00:14 |
| 2 | 2000-01-01 08:00:15 |
| 4 | 2000-01-01 08:00:16 |
| 4 | 2000-01-01 08:00:17 |
| 4 | 2000-01-01 08:00:18 |
| 4 | 2000-01-01 08:00:19 |
| 4 | 2000-01-01 08:00:20 |
+-------+---------------------+
感谢您的帮助。
I've just received a new data source for my application which inserts data into a Derby database only when it changes. Normally, missing data is fine - I'm drawing a line chart with the data (value over time), and I'd just draw a line between the two points, extrapolating the expected value at any given point. The problem is that as missing data in this case means "draw a straight line," the graph would be incorrect if I did this.
There are two ways I could fix this: I could create a new class that handles missing data differently (which could be difficult due to the way prefuse, the drawing library I'm using, handles drawing), or I could duplicate the rows, leaving the y value the same while changing the x value in each row. I could do this in the Java that bridges the database and the renderer, or I could modify the SQL.
My question is, given a result set like the one below:
+-------+---------------------+
| value | received |
+-------+---------------------+
| 7 | 2000-01-01 08:00:00 |
| 10 | 2000-01-01 08:00:05 |
| 11 | 2000-01-01 08:00:07 |
| 2 | 2000-01-01 08:00:13 |
| 4 | 2000-01-01 08:00:16 |
+-------+---------------------+
Assuming I query it at 8:00:20, how can I make it look like the following using SQL? Basically, I'm duplicating the row for every second until it's already taken. received
is, for all intents and purposes, unique (it's not, but it will be due to the WHERE
clause in the query).
+-------+---------------------+
| value | received |
+-------+---------------------+
| 7 | 2000-01-01 08:00:00 |
| 7 | 2000-01-01 08:00:01 |
| 7 | 2000-01-01 08:00:02 |
| 7 | 2000-01-01 08:00:03 |
| 7 | 2000-01-01 08:00:04 |
| 10 | 2000-01-01 08:00:05 |
| 10 | 2000-01-01 08:00:06 |
| 11 | 2000-01-01 08:00:07 |
| 11 | 2000-01-01 08:00:08 |
| 11 | 2000-01-01 08:00:09 |
| 11 | 2000-01-01 08:00:10 |
| 11 | 2000-01-01 08:00:11 |
| 11 | 2000-01-01 08:00:12 |
| 2 | 2000-01-01 08:00:13 |
| 2 | 2000-01-01 08:00:14 |
| 2 | 2000-01-01 08:00:15 |
| 4 | 2000-01-01 08:00:16 |
| 4 | 2000-01-01 08:00:17 |
| 4 | 2000-01-01 08:00:18 |
| 4 | 2000-01-01 08:00:19 |
| 4 | 2000-01-01 08:00:20 |
+-------+---------------------+
Thanks for your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
由于 SQL 基于集合的性质,没有简单的方法可以做到这一点。 我使用了两种解决方案策略:
a)使用循环从初始日期时间到结束日期时间,并为每个步骤获取值,并将其插入临时表中
b)用1分钟生成一个表(正常或临时)增量,将基准日期时间添加到此表中,您可以生成步骤。
方法 b) 示例(SQL Server 版本)
假设我们永远不会查询超过 24 小时的数据。 我们创建一个表intervals,其中包含一个 dttm 字段,其中包含每个步骤的分钟计数。 该表必须事先填充。
Due to the set based nature of SQL, there's no simple way to do this. I have used two solution strategies:
a) use a cycle to go from the initial to end date time and for each step get the value, and insert that into a temp table
b) generate a table (normal or temporary) with the 1 minute increments, adding the base date time to this table you can generate the steps.
Example of approach b) (SQL Server version)
Let's assume we will never query more than 24 hours of data. We create a table intervals that has a dttm field with the minute count for each step. That table must be populated previously.
在这种情况下,您似乎确实不需要生成所有这些数据点。 生成以下内容是否正确? 如果它画一条直线,则不需要每秒生成一个数据点,只需为每个数据点生成两个数据点......一个在当前时间,一个在下一次之前。 此示例从下一次减去 5 毫秒,但如果需要,您可以将其设置为一整秒。
如果是这种情况,那么您可以执行以下操作:
这样做的最大优点是它是基于集合的解决方案,并且比任何迭代方法都要快得多。
It seems like in this case you really don't need to generate all of these datapoints. Would it be correct to generate the following instead? If it's drawing a straight line, you don't need go generate a data point for each second, just two for each datapoint...one at the current time, one right before the next time. This example subtracts 5 ms from the next time, but you could make it a full second if you need it.
If that's the case, then you can do the following:
The big advantage of this is that it is a set based solution and will be much faster than any iterative approach.
您可以只移动光标,保留最后一个值的变量& 返回时间,如果当前时间领先一秒以上,则使用前一个值和新时间一次循环一秒,直到获得当前行的时间。
尝试在 SQL 中执行此操作会很痛苦,并且如果您创建了丢失的数据,则可能必须添加一列来跟踪实际/插值数据点。
You could just walk a cursor, keep vars for the last value & time returned, and if the current one is more than a second ahead, loop one second at a time using the previous value and the new time until you get the the current row's time.
Trying to do this in SQL would be painful, and if you went and created the missing data, you would possible have to add a column to track real / interpolated data points.
更好的办法是为图表上想要的每个轴值建立一个表,然后加入其中,或者甚至只是将数据字段放在那里,并在值到达时更新该记录。
“缺失值”问题相当广泛,因此我建议您制定可靠的政策。
将会发生的一件事是,您将有多个带有缺失值的相邻槽。
如果您可以将其转换为 OLAP 数据,这会容易得多。
Better would be to have a table for each axial value you want to have on the graph, and then either join to it or even just put the data field there and update that record when/if values arrive.
The "missing values" problem is quite extensive, so I suggest you have a solid policy.
One thing that will happen is that you will have multiple adjacent slots with missing values.
This would be much easier if you could transform it into OLAP data.
创建一个包含所有分钟的简单表(警告,将运行一段时间):
然后您可以像这样使用它:
Create a simple table that has all the minutes (warning, will run for a while):
You can then use it somewhat like this:
我建议不要在 SQL/数据库中解决这个问题,因为它基于集合的性质。
另外,您在这里处理的是秒数,所以我想您最终可能会得到很多行,具有相同的重复数据,必须将其从数据库传输到您的应用程序。
I would recommend against solving this in SQL/the database due to the set based nature of it.
Also you are dealing with seconds here so I guess you could end up with a lot of rows, with the same repeated data, that would have to be transfered from the database to you application.
处理此问题的一种方法是将数据左连接到包含所有接收值的表。 然后,当该行没有值时,您可以根据您拥有的上一个和下一个实际值计算预计值应为多少。
你没有说你使用什么数据库平台。 在 SQL Server 中,我将创建一个接受开始日期时间和结束日期时间值的用户定义函数。 它将返回一个表值,其中包含您需要的所有收到值。
我在下面模拟了它,它在 SQL Server 中运行。 别名为 r 的子选择是用户定义函数实际返回的内容。
One way to handle this is to left join your data against a table that contains all of the received values. Then, when there is no value for that row, you calculate what the projected value should be based on the previous and next actual values you have.
You didn't say what database platform you are using. In SQL Server, I would create a User Defined Function that accepts a start datetime and end datetime value. It would return a table value with all of the received values you need.
I have simulated it below, which runs in SQL Server. The subselect aliased r is what would actually get returned by the user defined function.
如果您使用 SQL Server,那么这将是一个好的开始。 我不确定 Apache 的 Derby 与 sql 有多接近。
If you were in SQL Server, then this would be a good start. I am not sure how close Apache's Derby is to sql.
作为一个想法,您可能需要查看 Anthony Mollinaro 的 SQL Cookbook,第 9 章。他有一个食谱,“填写缺失的日期”(查看第 278-281 页),主要讨论您想要做什么。 它需要某种顺序处理,要么通过辅助表,要么递归地执行查询。 虽然他没有直接提供 Derby 的示例,但我怀疑您可能可以根据您的问题调整它们(特别是 PostgreSQL 或 MySQL 的示例,这似乎与平台无关)。
As just an idea, you might want to check out Anthony Mollinaro's SQL Cookbook, chapter 9. He has a recipe, "Filling in Missing Dates" (check out pages 278-281), that discusses primarily what you are trying to do. It requires some sort of sequential handling, either via a helper table or doing the query recursively. While he doesn't have examples for Derby directly, I suspect you could probably adapt them to your problem (particularly the PostgreSQL or MySQL one, it seems somewhat platform agnostic).