如何通过Python调整MS SQL Server中的变量

发布于 2025-01-25 10:51:03 字数 1523 浏览 2 评论 0原文

我使用MS SQL Server编写了几个SQL查询,并使用以下代码使用PYODBC软件包将它们导入Python。

import pyodbc
import pandas as pd 

def conn_sql_server(file_path):
    '''Function to connect to SQL Server and save query result to a dataframe
        input:
            file_path - query file path
        output:
        df - dataframe from the query result
    '''

    # Connect to SQL Server
    conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
                      'Server= servername;'
                      'Database = databasename;'
                      'Trusted_Connection=yes;')

    # run query and ouput the result to df
    query = open(file_path, 'r') 
    df = pd.read_sql_query(query.read(), conn)
    query.close() 

    return df   

df1 = conn_sql_server('C:/Users/JJ/SQL script1')
df2 = conn_sql_server('C:/Users/JJ/SQL script2')
df3 = conn_sql_server('C:/Users/JJ/SQL script3')

在每个SQL查询中,我都使用声明并设置了设置变量(每个SQL查询中的变量都不同)。在这里,我只是从在线复制了一个随机查询,例如。我要做的是直接在Python中更新Year变量。我的实际查询很长,所以我不想在Python的SQL脚本上复制,我只想调整变量。有什么办法吗?

DECLARE @Year INT = 2022;
SELECT YEAR(date) @Year, 
       SUM(list_price * quantity) gross_sales
FROM sales.orders o
     INNER JOIN sales.order_items i ON i.order_id = o.order_id
GROUP BY YEAR(date)
order by @Year

我的另一个问题是,是否有任何方法可以添加一个语句,例如,将上述查询导入到Python之后,itemNumber = 1002345?我之所以问这个,是因为DF2是DF1的子集。限制列未选择在输出中显示,因此在DF1中阅读后,我无法在Python中进行过滤。我可以在DF1输出中添加该列,并在Python中进行更多的聚合,但这在很大程度上会增加原始数据大小和运行时间,因此我更喜欢不这样做。

I have several SQL queries written in MS SQL Server and I used the following code to import them into Python using the pyodbc package.

import pyodbc
import pandas as pd 

def conn_sql_server(file_path):
    '''Function to connect to SQL Server and save query result to a dataframe
        input:
            file_path - query file path
        output:
        df - dataframe from the query result
    '''

    # Connect to SQL Server
    conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
                      'Server= servername;'
                      'Database = databasename;'
                      'Trusted_Connection=yes;')

    # run query and ouput the result to df
    query = open(file_path, 'r') 
    df = pd.read_sql_query(query.read(), conn)
    query.close() 

    return df   

df1 = conn_sql_server('C:/Users/JJ/SQL script1')
df2 = conn_sql_server('C:/Users/JJ/SQL script2')
df3 = conn_sql_server('C:/Users/JJ/SQL script3')

In each SQL query, I have used DECLARE and SET to set the variables (variables are different in each SQL query). Here, I just copied a random query from online as an example. What I want to do is to update the Year variable directly in Python. My actual query is pretty long, so I don't want to copy over the SQL scripts in Python, I just want to adjust the variables. Any way to do it?

DECLARE @Year INT = 2022;
SELECT YEAR(date) @Year, 
       SUM(list_price * quantity) gross_sales
FROM sales.orders o
     INNER JOIN sales.order_items i ON i.order_id = o.order_id
GROUP BY YEAR(date)
order by @Year

My other question is, is there any way to add a WHERE statement, like WHERE itemNumber = 1002345 after importing the above query into Python? I'm asking this because df2 is a subset of df1. The restriction column isn't selected to show in the output, so I cannot do filterings in Python after reading in df1. I could add that column in the df1 output and do more aggregations in Python, but that would largely increase the original data size and running time, so I prefer not to do it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

人海汹涌 2025-02-01 10:51:03

这是您的脚本外观的示例。我们正在进行2个修改:

  • conn_sql_server现在采用以下参数:

    • 年:您可以通过要替换的年份声明@year ...
    • where_clause:a whene您选择的子句
    • tre_clause_starts_with:在此之前子句应放置
    • 的子句

  • modify_query方法,该方法读取文件内容并根据您提供的年份更改内容。如果您提供Where子句,它将在您在fre fore_clause_starts_with

    中提供的子句之前。

import pyodbc
import pandas as pd 

def modify_query(lines, year, where_clause, before_clause_starts_with):
    new_lines = []

    for line in lines:

        if year is not None:
            if line.lower().startswith('declare @year int ='):
                new_lines.append(f"DECLARE @Year INT = {year}\n")
                continue

        if where_clause is not None:
            if line.lower().startswith(before_clause_starts_with.lower()):
                new_lines.append(where_clause + "\n")
                new_lines.append(line)
                continue

        new_lines.append(line)

    new_query = ''.join(new_lines)
    return new_query


def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
    '''Function to connect to SQL Server and save query result to a dataframe
        input:
            file_path - query file path
        output:
        df - dataframe from the query result
    '''

    # Connect to SQL Server
    conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
                      'Server= servername;'
                      'Database = databasename;'
                      'Trusted_Connection=yes;')

    # run query and ouput the result to df
    query = open(file_path, 'r')
    lines = query.readlines()
    query.close()

    new_query = modify_query(lines, year, where_clause, before_clause_starts_with)

    df = pd.read_sql_query(new_query, conn)
    return df   

df1 = conn_sql_server('C:/Users/JJ/SQL script1', 
          year=1999,
          where_clause='WHERE itemNumber = 1002345', 
          before_clause_starts_with='group by')

df2 = conn_sql_server('C:/Users/JJ/SQL script2')

df3 = conn_sql_server('C:/Users/JJ/SQL script3',
          year = 1500)

仿真

中,让我们运行一个示例。

script1.sql

DECLARE @Year INT = 2022;
SELECT YEAR(date) @Year, 
       SUM(list_price * quantity) gross_sales
FROM sales.orders o
     INNER JOIN sales.order_items i ON i.order_id = o.order_id
GROUP BY YEAR(date)
order by @Year

script2.sql

DECLARE @Year INT = 2022;
SELECT gross_sales
FROM sales.orders
order by @Year

script3.sql

DECLARE @Year INT = 2022;
SELECT GETDATE()

使用与上述类似的脚本它被修改。

模拟脚本

#import pyodbc
#import pandas as pd 

def modify_query(lines, year, where_clause, before_clause_starts_with):
    new_lines = []

    print('-------')
    print('ORIGINAL')
    print('-------')
    print(lines)

    for line in lines:

        if year is not None:
            if line.lower().startswith('declare @year int ='):
                new_lines.append(f"DECLARE @Year INT = {year}\n")
                continue

        if where_clause is not None:
            if line.lower().startswith(before_clause_starts_with.lower()):
                new_lines.append(where_clause + "\n")
                new_lines.append(line)
                continue

        new_lines.append(line)


    print('-------')
    print('NEW')
    print('-------')
    new_query = ''.join(new_lines)
    print(new_query)

    return new_query


def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
    '''Function to connect to SQL Server and save query result to a dataframe
        input:
            file_path - query file path
        output:
        df - dataframe from the query result
    '''

    # Connect to SQL Server
    #conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
    #                  'Server= servername;'
    #                  'Database = databasename;'
    #                  'Trusted_Connection=yes;')

    # run query and ouput the result to df
    query = open(file_path, 'r')
    lines = query.readlines()
    query.close()

    new_query = modify_query(lines, year, where_clause, before_clause_starts_with)

    #df = pd.read_sql_query(new_query, conn)
    #return df   

#df1 = conn_sql_server('C:/Users/JJ/SQL script1')
#df2 = conn_sql_server('C:/Users/JJ/SQL script2')
#df3 = conn_sql_server('C:/Users/JJ/SQL script3')

df1 = conn_sql_server('script1.sql', year=1999, where_clause='WHERE itemNumber = 1002345', before_clause_starts_with='group by')
df2 = conn_sql_server('script2.sql')
df3 = conn_sql_server('script3.sql', year=1500)

原始查询1在脚本1.sql中像这样

['DECLARE @Year INT = 2022;\n', 'SELECT YEAR(date) @Year, \n', '       SUM(list_price * quantity) gross_sales\n', 'FROM sales.orders o\n', '     INNER JOIN sales.order_items i ON i.order_id = o.order_id\n', 'GROUP BY YEAR(date)\n', 'order by @Year']

,在运行脚本后,查询将变成

DECLARE @Year INT = 1999
SELECT YEAR(date) @Year, 
       SUM(list_price * quantity) gross_sales
FROM sales.orders o
     INNER JOIN sales.order_items i ON i.order_id = o.order_id
WHERE itemNumber = 1002345
GROUP BY YEAR(date)
order by @Year

Query 3用于看起来像这样的查询:

['DECLARE @Year INT = 2022;\n', 'SELECT GETDATE()']

它可以

DECLARE @Year INT = 1500
SELECT GETDATE()

通过更改python脚本来拍摄。

Here's a sample of how your script will look like. We are doing 2 modifications:

  • conn_sql_server now takes these parameters:

    • year: you can pass the year you want to replace declare @year...
    • where_clause: a where clause of your choice
    • before_clause_starts_with: the clause before which the where clause should be placed
  • modify_query method that reads the contents of the file and changes the content based on the year you provided. If you provide the where clause, it'll put it before the clause you provide in before_clause_starts_with

import pyodbc
import pandas as pd 

def modify_query(lines, year, where_clause, before_clause_starts_with):
    new_lines = []

    for line in lines:

        if year is not None:
            if line.lower().startswith('declare @year int ='):
                new_lines.append(f"DECLARE @Year INT = {year}\n")
                continue

        if where_clause is not None:
            if line.lower().startswith(before_clause_starts_with.lower()):
                new_lines.append(where_clause + "\n")
                new_lines.append(line)
                continue

        new_lines.append(line)

    new_query = ''.join(new_lines)
    return new_query


def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
    '''Function to connect to SQL Server and save query result to a dataframe
        input:
            file_path - query file path
        output:
        df - dataframe from the query result
    '''

    # Connect to SQL Server
    conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
                      'Server= servername;'
                      'Database = databasename;'
                      'Trusted_Connection=yes;')

    # run query and ouput the result to df
    query = open(file_path, 'r')
    lines = query.readlines()
    query.close()

    new_query = modify_query(lines, year, where_clause, before_clause_starts_with)

    df = pd.read_sql_query(new_query, conn)
    return df   

df1 = conn_sql_server('C:/Users/JJ/SQL script1', 
          year=1999,
          where_clause='WHERE itemNumber = 1002345', 
          before_clause_starts_with='group by')

df2 = conn_sql_server('C:/Users/JJ/SQL script2')

df3 = conn_sql_server('C:/Users/JJ/SQL script3',
          year = 1500)

Simulation

Let's run an example.

script1.sql

DECLARE @Year INT = 2022;
SELECT YEAR(date) @Year, 
       SUM(list_price * quantity) gross_sales
FROM sales.orders o
     INNER JOIN sales.order_items i ON i.order_id = o.order_id
GROUP BY YEAR(date)
order by @Year

script2.sql

DECLARE @Year INT = 2022;
SELECT gross_sales
FROM sales.orders
order by @Year

script3.sql

DECLARE @Year INT = 2022;
SELECT GETDATE()

Using a script similar to the above, we'll try to see how each script looks like after it gets modified.

Simulation script

#import pyodbc
#import pandas as pd 

def modify_query(lines, year, where_clause, before_clause_starts_with):
    new_lines = []

    print('-------')
    print('ORIGINAL')
    print('-------')
    print(lines)

    for line in lines:

        if year is not None:
            if line.lower().startswith('declare @year int ='):
                new_lines.append(f"DECLARE @Year INT = {year}\n")
                continue

        if where_clause is not None:
            if line.lower().startswith(before_clause_starts_with.lower()):
                new_lines.append(where_clause + "\n")
                new_lines.append(line)
                continue

        new_lines.append(line)


    print('-------')
    print('NEW')
    print('-------')
    new_query = ''.join(new_lines)
    print(new_query)

    return new_query


def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
    '''Function to connect to SQL Server and save query result to a dataframe
        input:
            file_path - query file path
        output:
        df - dataframe from the query result
    '''

    # Connect to SQL Server
    #conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
    #                  'Server= servername;'
    #                  'Database = databasename;'
    #                  'Trusted_Connection=yes;')

    # run query and ouput the result to df
    query = open(file_path, 'r')
    lines = query.readlines()
    query.close()

    new_query = modify_query(lines, year, where_clause, before_clause_starts_with)

    #df = pd.read_sql_query(new_query, conn)
    #return df   

#df1 = conn_sql_server('C:/Users/JJ/SQL script1')
#df2 = conn_sql_server('C:/Users/JJ/SQL script2')
#df3 = conn_sql_server('C:/Users/JJ/SQL script3')

df1 = conn_sql_server('script1.sql', year=1999, where_clause='WHERE itemNumber = 1002345', before_clause_starts_with='group by')
df2 = conn_sql_server('script2.sql')
df3 = conn_sql_server('script3.sql', year=1500)

Original query 1 was like this in script1.sql

['DECLARE @Year INT = 2022;\n', 'SELECT YEAR(date) @Year, \n', '       SUM(list_price * quantity) gross_sales\n', 'FROM sales.orders o\n', '     INNER JOIN sales.order_items i ON i.order_id = o.order_id\n', 'GROUP BY YEAR(date)\n', 'order by @Year']

After running the script, the query will become

DECLARE @Year INT = 1999
SELECT YEAR(date) @Year, 
       SUM(list_price * quantity) gross_sales
FROM sales.orders o
     INNER JOIN sales.order_items i ON i.order_id = o.order_id
WHERE itemNumber = 1002345
GROUP BY YEAR(date)
order by @Year

Query 3 used to look like this:

['DECLARE @Year INT = 2022;\n', 'SELECT GETDATE()']

It becomes

DECLARE @Year INT = 1500
SELECT GETDATE()

Give it a shot by changing the python script as you deem fit.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文