SQLite3中计算多列平均值
我需要以行方式对一些值进行平均,而不是列方式。 (如果我要按列求平均值,我可以使用 avg()
)。我对此的具体应用要求我在求平均值时忽略 NULL。这是非常简单的逻辑,但在 SQL 中似乎很难做到。有没有一种优雅的方法来进行计算?
我正在使用 SQLite3,因为它的价值。
详细信息
如果您需要更多详细信息,请参阅以下示例:
我有一个包含调查的表:(
| q1 | q2 | q3 | ... | q144 |
|----|-------|-------|-----|------|
| 1 | 3 | 7 | ... | 2 |
| 4 | 2 | NULL | ... | 1 |
| 5 | NULL | 2 | ... | 3 |
这些只是一些示例值和简单的列名称。有效值为 1 到 7 和 NULL。)
I需要像这样计算一些平均值:
q7 + q33 + q38 + q40 + ... + q119 / 11 as domain_score_1
q10 + q11 + q34 + q35 + ... + q140 / 13 as domain_score_2
...
q2 + q5 + q13 + q25 + ... + q122 / 12 as domain_score_14
...但是我需要根据非空值提取空值和平均值。因此,对于 domain_score_1
(有 11 个项目),我需要执行以下操作:
Input: 3, 5, NULL, 7, 2, NULL, 3, 1, 5, NULL, 1
(3 + 5 + 7 + 2 + 3 + 1 + 5 + 1) / (11 - 3)
27 / 8
3.375
我正在考虑的一个简单算法是:
输入:
3, 5, NULL, 7, 2, NULL, 3, 1, 5, NULL, 1
如果 NULL,则将每个值合并为 0:
3, 5, 0, 7, 2, 0, 3, 1, 5, 0, 1
求和:
27
获取非的数量-通过转换值来归零> 0 到 1 和求和:
3, 5, 0, 7, 2, 0, 3, 1, 5, 0, 1
1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1
8
将这两个数字相除
27 / 8
3.375
但这似乎需要更多的编程工作。有没有一种我不知道的优雅方法?
更新:
除非我误解了什么,否则 avg()
对此不起作用。我想做的示例:
select avg(q7, q33, q38, ..., q119) from survey;
输出:
SQL error near line 3: wrong number of arguments to function avg()
I need to average some values in a row-wise fashion, rather than a column-wise fashion. (If I were doing a column-wise average, I could just use avg()
). My specific application of this requires me ignore NULLs in averaging. It's pretty straightforward logic, but seems awfully difficult to do in SQL. Is there an elegant way of doing my calculation?
I'm using SQLite3, for what it's worth.
Details
If you need more details, here's an illustration:
I have a a table with a survey:
| q1 | q2 | q3 | ... | q144 |
|----|-------|-------|-----|------|
| 1 | 3 | 7 | ... | 2 |
| 4 | 2 | NULL | ... | 1 |
| 5 | NULL | 2 | ... | 3 |
(Those are just some example values and simple column names. The valid values are 1 through 7 and NULL.)
I need to calculate some averages like so:
q7 + q33 + q38 + q40 + ... + q119 / 11 as domain_score_1
q10 + q11 + q34 + q35 + ... + q140 / 13 as domain_score_2
...
q2 + q5 + q13 + q25 + ... + q122 / 12 as domain_score_14
...but i need to pull out the nulls and average based on the non-nulls. So, for domain_score_1
(which has 11 items), I would need to do:
Input: 3, 5, NULL, 7, 2, NULL, 3, 1, 5, NULL, 1
(3 + 5 + 7 + 2 + 3 + 1 + 5 + 1) / (11 - 3)
27 / 8
3.375
A simple algorithm I'm considering is:
Input:
3, 5, NULL, 7, 2, NULL, 3, 1, 5, NULL, 1
Coalesce each value to 0 if NULL:
3, 5, 0, 7, 2, 0, 3, 1, 5, 0, 1
Sum:
27
Get the number of non-zeros by converting values > 0 to 1 and sum:
3, 5, 0, 7, 2, 0, 3, 1, 5, 0, 1
1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1
8
Divide those two numbers
27 / 8
3.375
But that seems like a lot more programming than this should take. Is there an elegant way of doing this that I'm not aware of?
Update:
Unless I'm misunderstanding something, avg()
won't work for this. Example of what I would want to do:
select avg(q7, q33, q38, ..., q119) from survey;
Output:
SQL error near line 3: wrong number of arguments to function avg()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在标准 SQL 中,
如果为 null,SUM 将合并为 0,而 COUNT 将不计算 NULL。
(希望SQLite3符合)。
编辑:检查 http://www.sqlite.org/lang_aggfunc.html 并且 SQLite 符合;如果 sum() 会溢出,您可以使用 Total() 代替。
另外,我赞同重新规范化的观点,如果你不规范你的表设计(每当你看到名称中带有数字的列时就会发出危险信号),你将不会有优雅的 SQL。
In standard SQL
would give you what you want SUM will coalesce to 0 if null and COUNT will not count NULLs.
(hope SQLite3 conforms).
EDIT: Checked the http://www.sqlite.org/lang_aggfunc.html and SQLite conforms; if sum() is going to overflow you could use total() instead.
Also I second the opinions re normalization, if you don't normalize your table design (and whenever you see columns with numbers in their names raise a red flag) you are not going to have elegant SQL.
AVG
已经忽略空值并执行您想要的操作:来自 http://www.sqlite.org/lang_aggfunc.html
所以你可以采取每个域的值并将它们加载到另一个表中,然后在该表上运行平均值。或者,您也可以将宽桌子取消旋转并对其进行平均值。
AVG
适用于列,不适用于行。因此,如果您取消旋转表格,则可以使用AVG
而不会遇到您所面临的问题。让我们看一个小例子:您有一个表,它看起来像这样:
您想要对 q1 和 q2 进行平均,因为它们位于同一域中,但它们是单独的列,因此您不能这样做。但是,如果您将表格更改为如下所示:
那么您可以轻松获取两个问题的平均值:
如果您想要每个 ID 的平均值而不是全局平均值,则可以按 ID 分组:
AVG
already ignores nulls and does what you want:From http://www.sqlite.org/lang_aggfunc.html
So you can probably take the values you have per domain and load them into another table and then just run averages on that table. Or you could just unpivot your wide table and run averages on that, too.
AVG
works on columns, not rows. So if you unpivoted your table you could useAVG
and not have the problem you're facing. Let's look at a small example:You have a table and it looks like this:
You want to average q1 and q2 together because they're in the same domain, but they're separate columns so you can't. But if you changed your table to look like this:
Then you could take the average of the two questions easily:
And you can group by ID if you want an average per ID rather than a global average:
这将是一个巨大的查询,但您可以这样做:
这会将列转换为行并使用
AVG()
函数。当然,您可能只想将其用于特定的调查记录,因此不要忘记 WHERE 子句:
如果您将 q 列规范化到它们自己的表中,每行一个问题和引用,您会轻松很多回到调查。调查和问题之间存在一对多关系。
This is going to be a monstrous query, but you could do this:
This converts your columns to rows and uses the
AVG()
function.Of course, you probably want this for just a particular survey record, so don't forget the WHERE clause:
You'd have a lot easier time if you normalized the q columns into their own table, with one question per row, and references back to survey. You'd have a 1 to many relationship between survey and question.
使用单独的表来存储不同问题的调查分数(假设 q 是因为问题)。类似以下内容
之后您可以运行如下查询
Use a separate table to store survey scores for different questions (assuming that q's are because of question). Something like following
After that you can run query like
使用:
不要使用
UNION
- 如果存在重复项,您需要重复项。Use:
Don't use
UNION
- you want duplicates if they exist.