如何用SQL中类别的平均值替换为空值?
列
具有
值 | 无效 | 在 | 一个 | 中 |
---|---|---|---|---|
数据 | 有 | 我 | 的 | 集 |
| | | | |
-01 | 42737 | 747 | 60 | |
2021-07-03 | 42736 | 748 | 30 | 60.0 |
2021-07-07-03 | 42735 | 747 | 15 | 42.62 |
2021-07-07-07-04 42734 42734 | 42734 | 748 | 30 | nan |
42734 -05 | 42734 | 748 | 30 | 100.0 |
2021-07-10 | 42738 | 747 | 15 | 50.72 |
2021-08-12 | 42739 | 748 | 30 | 73.43 |
我希望用“ datemention_id,actionitioner_id,timentioner_id,pertioner_duration_duration_duration_min_min “是一样的。
使用pandas dataframe
df['revenues_from_appointment'].fillna(df.groupby(['patient_id','practitioner_id','appointment_duration_min'])['revenues_from_appointment'].transform('mean'), inplace = True)
通过
最终
约会 | ? | 获得 | ||
---|---|---|---|---|
如何 | 相同 | 我 | 结果 | 来 |
, | SQL | 输出 | 的 | 使用 |
2021-07-01 | 42737 | 747 | 60 | 150.0 |
2021-07-03 | 42736 42736 | 748 | 30 | 60.0 |
2021-07-07-07-03 | 42735 | 747 | 15 | 42.62 2021-07-07-07-07-07-07-07-04 |
42734 | 42734 | 748 | 30 30 | <强> 95.0 <> 95.0 |
2021-07-05 | 42734 | 748 | 30 | 100.0 |
2021-07-10 | 42738 | 747 | 15 | 50.72 |
2021-08-12 | 42739 | 748 | 30 | 73.43 |
I have a dataset with null values in the column 'revenues_from_appointment'
Dataset
appointment_date | patient_id | practitioner_id | appointment_duration_min | revenues_from_appointment |
---|---|---|---|---|
2021-06-28 | 42734 | 748 | 30 | 90.0 |
2021-06-29 | 42737 | 747 | 60 | 150.0 |
2021-07-01 | 42737 | 747 | 60 | NaN |
2021-07-03 | 42736 | 748 | 30 | 60.0 |
2021-07-03 | 42735 | 747 | 15 | 42.62 |
2021-07-04 | 42734 | 748 | 30 | NaN |
2021-07-05 | 42734 | 748 | 30 | 100.0 |
2021-07-10 | 42738 | 747 | 15 | 50.72 |
2021-08-12 | 42739 | 748 | 30 | 73.43 |
I wish to replace NULL values by the mean value of rows where "patient_id, practitioner_id, appointment_duration_min" is the same.
I did it using pandas dataframe,
df['revenues_from_appointment'].fillna(df.groupby(['patient_id','practitioner_id','appointment_duration_min'])['revenues_from_appointment'].transform('mean'), inplace = True)
How can we obtain the same result by using SQL?
Final Output
appointment_date | patient_id | practitioner_id | appointment_duration_min | revenues_from_appointment |
---|---|---|---|---|
2021-06-28 | 42734 | 748 | 30 | 90.0 |
2021-06-29 | 42737 | 747 | 60 | 150.0 |
2021-07-01 | 42737 | 747 | 60 | 150.0 |
2021-07-03 | 42736 | 748 | 30 | 60.0 |
2021-07-03 | 42735 | 747 | 15 | 42.62 |
2021-07-04 | 42734 | 748 | 30 | 95.0 |
2021-07-05 | 42734 | 748 | 30 | 100.0 |
2021-07-10 | 42738 | 747 | 15 | 50.72 |
2021-08-12 | 42739 | 748 | 30 | 73.43 |
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用
avg
窗口函数,该函数将在感兴趣的三列上分区,并使用cocecce
函数替换null值:尝试在这里。
You can use the
AVG
window function, that will partition on the three column of interest and replace null values using theCOALESCE
function:Try it here.