如何找出雪花数据库中是否存在缺失日期,并根据其他日期的值获取缺失日期的值

发布于 2025-01-11 11:49:20 字数 2232 浏览 0 评论 0原文

我有一个日期和序列号列表,我想获取雪花数据库中每个日期的所有列的值。如果该日期在此数据库中可用,我可以获取从数据库中选择的列的值,但如果该日期在此数据库中不可用,我应该获取下一个日期以及即将到来的日期中所有列的值数据库。另外,我想告诉你我正在开发的数据库是一个大数据库(TB)。

这是一个示例:

日期 SN 输出

2020-08-18 SSK00100 7

2020-08-19 SSK00100 15

2020-10-20 SSK00100 12

2020-10-21 SSK00100 19

2021-01-05 SSK00105 101

2021-01-08 SSK00106 112

这是输出,我正在寻找:

Date        SN       Output_1

2020-08-19 SSK00100 15(对于日期:2020-08-19,我应该获取此数据)

如果数据库中没有该日期,例如“2021-01-07”,我应该获取每列的数据:

Date SN Output_1

2021-01-08 SSK00106 112

这是我的代码:

导入包

import numpy as np

import pandas as pd

src –>四个Python文件:获取数据、加载数据、分割数据和模型构建

from src.load_data import run_query

from IPython.display import display

from bisect import bisect_left

import os

import matplotlib.pyplot as plt

import seaborn as sns

from scipy import stats

import matplotlib .style as style

from 6 import string_types

代码

def sql_fun(serial, start):

where_clause = f"SER_NUM='{serial}'" + "and " + f"OCUR_TS ='{start}'"

sql = f"""
        with grouped_table as (
           select Date,
                  SN,
                  Output,
                  count(Output) over (order by Date) as _grp
           from "CD_EQUIP_DATA_PROD_DB"."EDW_BI_SL_M1"."EQP_EVNT_VIMS"
           where {where_clause}
           ORDER BY Date
        ), final_table as(
           select Date,
                  SN,
                  Output,
                  _grp,
                  first_value(Output) over (partition by _grp order by Date) as
                      Output_1
            from grouped_table
            where {where_clause}
            ORDER BY Date
        )
        select Date,
               SN,
               Output_1
        from final_table
        where {where_clause}
        ORDER BY Date

        """
df = run_query(sql, server="east-2")

return df

df_SSP00180_1 = sql_fun ('SSK00100','2020-08-19')

df_SSP00238_2 = sql_fun ('SSK00109','2021-01-07')

print (df_SSP00180_1)

print (df_SSP00238_2)

我真的需要你的帮助,非常感谢

I have a list of dates and Serial numbers and I want to get the values for all columns from each date in snowflake database. If the date is available in this database, I can get the values of the columns selected from the database, but if the date is not available in this database, I should get the next date and the values from all columns from upcoming date from the database. Also, I would like to inform you the database which I am working on is a big database (TB).

This is an example:

Date SN Output

2020-08-18 SSK00100 7

2020-08-19 SSK00100 15

2020-10-20 SSK00100 12

2020-10-21 SSK00100 19

2021-01-05 SSK00105 101

2021-01-08 SSK00106 112

This is the output, I am looking for:

Date        SN       Output_1

2020-08-19 SSK00100 15 (for the date: 2020-08-19, I should get this data)

if the date is not available in the database, for example '2021-01-07', I should get this data for each column:

Date SN Output_1

2021-01-08 SSK00106 112

This is my code:

Import packages

import numpy as np

import pandas as pd

src –> Four python files: get data, load data, splitting data, and Model building

from src.load_data import run_query

from IPython.display import display

from bisect import bisect_left

import os

import matplotlib.pyplot as plt

import seaborn as sns

from scipy import stats

import matplotlib.style as style

from six import string_types

Code

def sql_fun (serial, start):

where_clause = f"SER_NUM='{serial}'" + "and " + f"OCUR_TS ='{start}'"

sql = f"""
        with grouped_table as (
           select Date,
                  SN,
                  Output,
                  count(Output) over (order by Date) as _grp
           from "CD_EQUIP_DATA_PROD_DB"."EDW_BI_SL_M1"."EQP_EVNT_VIMS"
           where {where_clause}
           ORDER BY Date
        ), final_table as(
           select Date,
                  SN,
                  Output,
                  _grp,
                  first_value(Output) over (partition by _grp order by Date) as
                      Output_1
            from grouped_table
            where {where_clause}
            ORDER BY Date
        )
        select Date,
               SN,
               Output_1
        from final_table
        where {where_clause}
        ORDER BY Date

        """
df = run_query(sql, server="east-2")

return df

df_SSP00180_1 = sql_fun ('SSK00100','2020-08-19')

df_SSP00238_2 = sql_fun ('SSK00109','2021-01-07')

print (df_SSP00180_1)

print (df_SSP00238_2)

I need really to your help, thanks a lot

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文