SQLite3中计算多列平均值

发布于 2024-08-26 23:14:48 字数 1679 浏览 10 评论 0原文

我需要以方式对一些值进行平均,而不是方式。 (如果我要按列求平均值,我可以使用 avg())。我对此的具体应用要求我在求平均值时忽略 NULL。这是非常简单的逻辑,但在 SQL 中似乎很难做到。有没有一种优雅的方法来进行计算?

我正在使用 SQLite3,因为它的价值。

详细信息

如果您需要更多详细信息,请参阅以下示例:

我有一个包含调查的表:(

| q1 | q2    | q3    | ... | q144 |
|----|-------|-------|-----|------|
| 1  | 3     | 7     | ... | 2    |
| 4  | 2     | NULL  | ... | 1    |
| 5  | NULL  | 2     | ... | 3    |

这些只是一些示例值和简单的列名称。有效值为 1 到 7 和 NULL。)

I需要像这样计算一些平均值:

q7 + q33 + q38 + q40 + ... + q119 / 11 as domain_score_1
q10 + q11 + q34 + q35 + ... + q140 / 13 as domain_score_2
...
q2 + q5 + q13 + q25 + ... + q122 / 12 as domain_score_14

...但是我需要根据非空值提取空值和平均值。因此,对于 domain_score_1 (有 11 个项目),我需要执行以下操作:

Input:  3, 5, NULL, 7, 2, NULL, 3, 1, 5, NULL, 1

(3 + 5 + 7 + 2 + 3 + 1 + 5 + 1) / (11 - 3)
27 / 8
3.375

我正在考虑的一个简单算法是:

输入:

3, 5, NULL, 7, 2, NULL, 3, 1, 5, NULL, 1 

如果 NULL,则将每个值合并为 0:

3, 5, 0, 7, 2, 0, 3, 1, 5, 0, 1

求和:

27

获取非的数量-通过转换值来归零> 0 到 1 和求和:

3, 5, 0, 7, 2, 0, 3, 1, 5, 0, 1
1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1
8

将这两个数字相除

27 / 8
3.375

但这似乎需要更多的编程工作。有没有一种我不知道的优雅方法?

更新:

除非我误解了什么,否则 avg() 对此不起作用。我想做的示例:

select avg(q7, q33, q38, ..., q119) from survey;

输出:

SQL error near line 3: wrong number of arguments to function avg()

I need to average some values in a row-wise fashion, rather than a column-wise fashion. (If I were doing a column-wise average, I could just use avg()). My specific application of this requires me ignore NULLs in averaging. It's pretty straightforward logic, but seems awfully difficult to do in SQL. Is there an elegant way of doing my calculation?

I'm using SQLite3, for what it's worth.

Details

If you need more details, here's an illustration:

I have a a table with a survey:

| q1 | q2    | q3    | ... | q144 |
|----|-------|-------|-----|------|
| 1  | 3     | 7     | ... | 2    |
| 4  | 2     | NULL  | ... | 1    |
| 5  | NULL  | 2     | ... | 3    |

(Those are just some example values and simple column names. The valid values are 1 through 7 and NULL.)

I need to calculate some averages like so:

q7 + q33 + q38 + q40 + ... + q119 / 11 as domain_score_1
q10 + q11 + q34 + q35 + ... + q140 / 13 as domain_score_2
...
q2 + q5 + q13 + q25 + ... + q122 / 12 as domain_score_14

...but i need to pull out the nulls and average based on the non-nulls. So, for domain_score_1 (which has 11 items), I would need to do:

Input:  3, 5, NULL, 7, 2, NULL, 3, 1, 5, NULL, 1

(3 + 5 + 7 + 2 + 3 + 1 + 5 + 1) / (11 - 3)
27 / 8
3.375

A simple algorithm I'm considering is:

Input:

3, 5, NULL, 7, 2, NULL, 3, 1, 5, NULL, 1 

Coalesce each value to 0 if NULL:

3, 5, 0, 7, 2, 0, 3, 1, 5, 0, 1

Sum:

27

Get the number of non-zeros by converting values > 0 to 1 and sum:

3, 5, 0, 7, 2, 0, 3, 1, 5, 0, 1
1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1
8

Divide those two numbers

27 / 8
3.375

But that seems like a lot more programming than this should take. Is there an elegant way of doing this that I'm not aware of?

Update:

Unless I'm misunderstanding something, avg() won't work for this. Example of what I would want to do:

select avg(q7, q33, q38, ..., q119) from survey;

Output:

SQL error near line 3: wrong number of arguments to function avg()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

放血 2024-09-02 23:14:48

在标准 SQL 中,

SELECT 
(SUM(q7)+SUM(q33)+SUM(q38)+SUM(q40)+..+SUM(q119))/
(COUNT(q7)+COUNT(q33)+COUNT(q38)+COUNT(q40)+..+COUNT(q119)) AS domain_score1 
FROM survey

如果为 null,SUM 将合并为 0,而 COUNT 将不计算 NULL。
(希望SQLite3符合)。

编辑:检查 http://www.sqlite.org/lang_aggfunc.html 并且 SQLite 符合;如果 sum() 会溢出,您可以使用 Total() 代替。

另外,我赞同重新规范化的观点,如果你不规范你的表设计(每当你看到名称中带有数字的列时就会发出危险信号),你将不会有优雅的 SQL。

In standard SQL

SELECT 
(SUM(q7)+SUM(q33)+SUM(q38)+SUM(q40)+..+SUM(q119))/
(COUNT(q7)+COUNT(q33)+COUNT(q38)+COUNT(q40)+..+COUNT(q119)) AS domain_score1 
FROM survey

would give you what you want SUM will coalesce to 0 if null and COUNT will not count NULLs.
(hope SQLite3 conforms).

EDIT: Checked the http://www.sqlite.org/lang_aggfunc.html and SQLite conforms; if sum() is going to overflow you could use total() instead.

Also I second the opinions re normalization, if you don't normalize your table design (and whenever you see columns with numbers in their names raise a red flag) you are not going to have elegant SQL.

草莓味的萝莉 2024-09-02 23:14:48

AVG 已经忽略空值并执行您想要的操作:

avg() 函数返回组内所有非 NULL X 的平均值。看起来不像数字的字符串和 BLOB 值被解释为 0。只要有至少一个非 NULL 输入,即使所有输入都是整数,avg() 的结果也始终是浮点值。当且仅当没有非 NULL 输入时,avg() 的结果才为 NULL。

来自 http://www.sqlite.org/lang_aggfunc.html

所以你可以采取每个域的值并将它们加载到另一个表中,然后在该表上运行平均值。或者,您也可以将宽桌子取消旋转并对其进行平均值。


AVG 适用于列,不适用于行。因此,如果您取消旋转表格,则可以使用 AVG 而不会遇到您所面临的问题。让我们看一个小例子:

您有一个表,它看起来像这样:

ID  | q1  | q2  | q3
----------------------
1   | 1   | 2   | NULL
2   | NULL| 2   | 56

您想要对 q1 和 q2 进行平均,因为它们位于同一域中,但它们是单独的列,因此您不能这样做。但是,如果您将表格更改为如下所示:

ID  | question | value
-----------------------
1   | 1        | 1
1   | 2        | 2
1   | 3        | NULL
2   | 1        | NULL
2   | 2        | 2
2   | 3        | 56

那么您可以轻松获取两个问题的平均值:

SELECT AVG(value)
FROM Table
WHERE question IN (1,2)

如果您想要每个 ID 的平均值而不是全局平均值,则可以按 ID 分组:

SELECT ID, AVG(value)
FROM Table
WHERE question IN (1,2)
GROUP BY ID

AVG already ignores nulls and does what you want:

The avg() function returns the average value of all non-NULL X within a group. String and BLOB values that do not look like numbers are interpreted as 0. The result of avg() is always a floating point value as long as at there is at least one non-NULL input even if all inputs are integers. The result of avg() is NULL if and only if there are no non-NULL inputs.

From http://www.sqlite.org/lang_aggfunc.html

So you can probably take the values you have per domain and load them into another table and then just run averages on that table. Or you could just unpivot your wide table and run averages on that, too.


AVG works on columns, not rows. So if you unpivoted your table you could use AVG and not have the problem you're facing. Let's look at a small example:

You have a table and it looks like this:

ID  | q1  | q2  | q3
----------------------
1   | 1   | 2   | NULL
2   | NULL| 2   | 56

You want to average q1 and q2 together because they're in the same domain, but they're separate columns so you can't. But if you changed your table to look like this:

ID  | question | value
-----------------------
1   | 1        | 1
1   | 2        | 2
1   | 3        | NULL
2   | 1        | NULL
2   | 2        | 2
2   | 3        | 56

Then you could take the average of the two questions easily:

SELECT AVG(value)
FROM Table
WHERE question IN (1,2)

And you can group by ID if you want an average per ID rather than a global average:

SELECT ID, AVG(value)
FROM Table
WHERE question IN (1,2)
GROUP BY ID
兔小萌 2024-09-02 23:14:48

这将是一个巨大的查询,但您可以这样做:

SELECT AVG(q) FROM
((SELECT q7 AS q FROM survey) UNION ALL
(SELECT q33 FROM survey) UNION ALL
(SELECT q38 FROM survey) UNION ALL
...
(SELECT q119 FROM survey))

这会将列转换为行并使用 AVG() 函数。

当然,您可能只想将其用于特定的调查记录,因此不要忘记 WHERE 子句:

SELECT AVG(q) FROM
((SELECT q7 AS q FROM survey WHERE survey_id = 1) UNION ALL
(SELECT q33 FROM survey WHERE survey_id = 1) UNION ALL
(SELECT q38 FROM survey WHERE survey_id = 1) UNION ALL
...
(SELECT q119 FROM survey WHERE survey_id = 1))

如果您将 q 列规范化到它们自己的表中,每行一个问题和引用,您会轻松很多回到调查。调查和问题之间存在一对多关系。

This is going to be a monstrous query, but you could do this:

SELECT AVG(q) FROM
((SELECT q7 AS q FROM survey) UNION ALL
(SELECT q33 FROM survey) UNION ALL
(SELECT q38 FROM survey) UNION ALL
...
(SELECT q119 FROM survey))

This converts your columns to rows and uses the AVG() function.

Of course, you probably want this for just a particular survey record, so don't forget the WHERE clause:

SELECT AVG(q) FROM
((SELECT q7 AS q FROM survey WHERE survey_id = 1) UNION ALL
(SELECT q33 FROM survey WHERE survey_id = 1) UNION ALL
(SELECT q38 FROM survey WHERE survey_id = 1) UNION ALL
...
(SELECT q119 FROM survey WHERE survey_id = 1))

You'd have a lot easier time if you normalized the q columns into their own table, with one question per row, and references back to survey. You'd have a 1 to many relationship between survey and question.

执手闯天涯 2024-09-02 23:14:48

使用单独的表来存储不同问题的调查分数(假设 q 是因为问题)。类似以下内容

SurveyTable(SurveyId, ...)
SurveyRatings(SurveyId, QuestionId, Rating)

之后您可以运行如下查询

SELECT avg(Rating) WHERE SurveyId=?

Use a separate table to store survey scores for different questions (assuming that q's are because of question). Something like following

SurveyTable(SurveyId, ...)
SurveyRatings(SurveyId, QuestionId, Rating)

After that you can run query like

SELECT avg(Rating) WHERE SurveyId=?
南渊 2024-09-02 23:14:48

使用:

SELECT AVG(x.answer)
  FROM (SELECT s.q7 AS answer
          FROM SURVEY s
        UNION ALL
        SELECT s.q33
          FROM SURVEY s
        UNION ALL    
       SELECT s.q38
         FROM SURVEY s
       ...
       UNION ALL
       SELECT s.q119
         FROM SURVEY s) x

不要使用UNION - 如果存在重复项,您需要重复项。

Use:

SELECT AVG(x.answer)
  FROM (SELECT s.q7 AS answer
          FROM SURVEY s
        UNION ALL
        SELECT s.q33
          FROM SURVEY s
        UNION ALL    
       SELECT s.q38
         FROM SURVEY s
       ...
       UNION ALL
       SELECT s.q119
         FROM SURVEY s) x

Don't use UNION - you want duplicates if they exist.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文