如何找出雪花数据库中是否存在缺失日期,并根据其他日期的值获取缺失日期的值
我有一个日期和序列号列表,我想获取雪花数据库中每个日期的所有列的值。如果该日期在此数据库中可用,我可以获取从数据库中选择的列的值,但如果该日期在此数据库中不可用,我应该获取下一个日期以及即将到来的日期中所有列的值数据库。另外,我想告诉你我正在开发的数据库是一个大数据库(TB)。
这是一个示例:
日期 SN 输出
2020-08-18 SSK00100 7
2020-08-19 SSK00100 15
2020-10-20 SSK00100 12
2020-10-21 SSK00100 19
2021-01-05 SSK00105 101
2021-01-08 SSK00106 112
这是输出,我正在寻找:
Date SN Output_1
2020-08-19 SSK00100 15(对于日期:2020-08-19,我应该获取此数据)
如果数据库中没有该日期,例如“2021-01-07”,我应该获取每列的数据:
Date SN Output_1
2021-01-08 SSK00106 112
这是我的代码:
导入包
import numpy as np
import pandas as pd
src –>四个Python文件:获取数据、加载数据、分割数据和模型构建
from src.load_data import run_query
from IPython.display import display
from bisect import bisect_left
import os
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import matplotlib .style as style
from 6 import string_types
代码
def sql_fun(serial, start):
where_clause = f"SER_NUM='{serial}'" + "and " + f"OCUR_TS ='{start}'"
sql = f"""
with grouped_table as (
select Date,
SN,
Output,
count(Output) over (order by Date) as _grp
from "CD_EQUIP_DATA_PROD_DB"."EDW_BI_SL_M1"."EQP_EVNT_VIMS"
where {where_clause}
ORDER BY Date
), final_table as(
select Date,
SN,
Output,
_grp,
first_value(Output) over (partition by _grp order by Date) as
Output_1
from grouped_table
where {where_clause}
ORDER BY Date
)
select Date,
SN,
Output_1
from final_table
where {where_clause}
ORDER BY Date
"""
df = run_query(sql, server="east-2")
return df
df_SSP00180_1 = sql_fun ('SSK00100','2020-08-19')
df_SSP00238_2 = sql_fun ('SSK00109','2021-01-07')
print (df_SSP00180_1)
print (df_SSP00238_2)
我真的需要你的帮助,非常感谢
I have a list of dates and Serial numbers and I want to get the values for all columns from each date in snowflake database. If the date is available in this database, I can get the values of the columns selected from the database, but if the date is not available in this database, I should get the next date and the values from all columns from upcoming date from the database. Also, I would like to inform you the database which I am working on is a big database (TB).
This is an example:
Date SN Output
2020-08-18 SSK00100 7
2020-08-19 SSK00100 15
2020-10-20 SSK00100 12
2020-10-21 SSK00100 19
2021-01-05 SSK00105 101
2021-01-08 SSK00106 112
This is the output, I am looking for:
Date SN Output_1
2020-08-19 SSK00100 15 (for the date: 2020-08-19, I should get this data)
if the date is not available in the database, for example '2021-01-07', I should get this data for each column:
Date SN Output_1
2021-01-08 SSK00106 112
This is my code:
Import packages
import numpy as np
import pandas as pd
src –> Four python files: get data, load data, splitting data, and Model building
from src.load_data import run_query
from IPython.display import display
from bisect import bisect_left
import os
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import matplotlib.style as style
from six import string_types
Code
def sql_fun (serial, start):
where_clause = f"SER_NUM='{serial}'" + "and " + f"OCUR_TS ='{start}'"
sql = f"""
with grouped_table as (
select Date,
SN,
Output,
count(Output) over (order by Date) as _grp
from "CD_EQUIP_DATA_PROD_DB"."EDW_BI_SL_M1"."EQP_EVNT_VIMS"
where {where_clause}
ORDER BY Date
), final_table as(
select Date,
SN,
Output,
_grp,
first_value(Output) over (partition by _grp order by Date) as
Output_1
from grouped_table
where {where_clause}
ORDER BY Date
)
select Date,
SN,
Output_1
from final_table
where {where_clause}
ORDER BY Date
"""
df = run_query(sql, server="east-2")
return df
df_SSP00180_1 = sql_fun ('SSK00100','2020-08-19')
df_SSP00238_2 = sql_fun ('SSK00109','2021-01-07')
print (df_SSP00180_1)
print (df_SSP00238_2)
I need really to your help, thanks a lot
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论