气流:错误&quot用UTF-16LE进行解码。使用多个连接时

发布于 2025-02-13 19:25:23 字数 6225 浏览 1 评论 0原文

我的ETL过程有问题。 我有ETL流程,用Python编写,并且效果很好,但是操作 开始一个接一个,因此整个过程持续很长时间。 我在Apache气流中有点新鲜,但是我已经挖了一个问题,有一个问题 和他一起) 我遇到了一个错误:

 File "/usr/lib/python3.8/encodings/utf_16_le.py", line 15, in decode
    def decode(input, errors='strict'):
  File "/usr/local/lib/python3.8/dist-packages/airflow/models/taskinstance.py", line 1543, in signal_handler
    raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal

The above exception was the direct cause of the following exception:

airflow.exceptions.AirflowException: decoding with 'utf-16le' codec failed (AirflowException: Task received SIGTERM signal)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1705, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 716, in do_execute
    cursor.execute(statement, parameters)
SystemError: <class 'pyodbc.Error'> returned a result with an error set

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 896, in _rollback_impl
    self.engine.dialect.do_rollback(self.connection)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 666, in do_rollback
    dbapi_connection.rollback()
pyodbc.OperationalError: ('08S01', '[08S01] [Microsoft][ODBC Driver 18 for SQL Server]Communication link failure (0) (SQLEndTran)') 

这是我任务的代码。可以立即有多达10个连接:

def update_from_gladiator_ost(market_id):    
    query = "DELETE from [stage].[dbo].[rests_by_docs_temp] where market_id = %d" % market_id
    execute_query_dwh(query)
    engine = dwh_conn()
    connection = engine.raw_connection()
    abc = connection.cursor()
    # abc.execute("DELETE from [stage].[dbo].[sell_movement_temp]; DELETE from [stage].[dbo].[rests_by_docs_temp]")
    df_op = pd.read_sql(
            "SET NOCOUNT ON exec [dbo].[mp_report_finance_agent_enhanced_basis_transport_royalty_NC_ost_by_docs4] @pmarket_id = %d, @pstart_date = '%s', @pend_date = '%s', @pselect = '1'" % (
            market_id, z, w), gladiator_conn())
    df_op = df_op.fillna(value=0)
    for row_count in range(0, df_op.shape[0]):
        chunk = df_op.iloc[row_count:row_count + 1, :].values.tolist()
        tuple_of_tuples = tuple(tuple(x) for x in chunk)
        abc.executemany(
                "insert into stage.dbo.rests_by_docs_temp" + " ([date_start],[market_id],[good_id],[agent_id],[doc_id],[tstart_qty],[tstart_amt],[IMP],[doc_name]) values   (?,?,?,?,?,?,?,?,?)",
                tuple_of_tuples)
        abc.commit()
    connection.close()

如您所见,我从数据库中获取数据并将其插入我的DWH

,这是我的连接:

def dwh_conn():
    mySQL = '192.168.240.1'
    myDB = 'DWH'
    login = 'sa'
    PWD = '....'
    Encrypt = 'No'
    Certificate = 'Yes'

    params = urllib.parse.quote_plus("DRIVER={ODBC Driver 18 for SQL Server};"
                                     "SERVER=" + mySQL + ";"
                                                         "SERVER=" + mySQL + ";"
                                                                             "Port=1433" + ";"
                                                                                           "DATABASE=" + myDB + ";"
                                                                                                                "UID=" + login + ";"
                                                                                                                                 "PWD=" + PWD + ";"
                                                                                                                                                "Encrypt=" + Encrypt + ";"
                                                                                                                                                                       "TrustServerCertificate=" + Certificate + ";")
    engine = sa.create_engine('mssql+pyodbc:///?odbc_connect={}?charset=utf8mb4'.format(params), fast_executemany=True)
    return engine


def gladiator_conn():
    mySQL = '...'
    myDB = '...'
    login = '...'
    PWD = '...'
    Encrypt = 'No'
    Certificate = 'Yes'
    params = urllib.parse.quote_plus("DRIVER={ODBC Driver 18 for SQL Server};"
                                     "SERVER=" + mySQL + ";"
                                                         "Port=1433" + ";"
                                                                       "DATABASE=" + myDB + ";"
                                                                                            "UID=" + login + ";"
                                                                                                             "PWD=" + PWD + ";"
                                                                                                                            "Encrypt=" + Encrypt + ";"
                                                                                                                                                   "TrustServerCertificate=" + Certificate + ";")
    engine = sa.create_engine('mssql+pyodbc:///?odbc_connect={}?charset=utf8mb4'.format(params), fast_executemany=True)
    return engine

我认为问题是在UnixoDBC中。因为当我在Windows上使用Pycharm进行整个代码时,每个人都可以。 但是在Docker Ubuntu/气流上 - 有时会失败。 我可以重新启动失败的任务,并且可以很好地进行,但可能会再次

更新失败: 我想,我找到了一种解决方案,但我在案件中无法意识到这一点。

def decode_sketchy_utf16(raw_bytes):
    s = raw_bytes.decode("utf-16le", "ignore")
    try:
        n = s.index('\u0000')
        s = s[:n]  # respect null terminator
    except ValueError:
        pass
    return s

# ...

prev_converter = cnxn.get_output_converter(pyodbc.SQL_WVARCHAR)
cnxn.add_output_converter(pyodbc.SQL_WVARCHAR, decode_sketchy_utf16)
col_info = crsr.columns("Clients").fetchall()
cnxn.add_output_converter(pyodbc.SQL_WVARCHAR, prev_converter)  # restore previous behaviour

帮助我如何使其在我的代码中起作用?我应该在哪里实施它?

Have a problem with my ETL process.
I've got ETL process, written in python and it works great, but operations
starts one after another, so the whole process lasts much time.
I'm slightly new in Apache Airflow, but I've made a DUG and there is a problem
with him)
I get a mistake:

 File "/usr/lib/python3.8/encodings/utf_16_le.py", line 15, in decode
    def decode(input, errors='strict'):
  File "/usr/local/lib/python3.8/dist-packages/airflow/models/taskinstance.py", line 1543, in signal_handler
    raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal

The above exception was the direct cause of the following exception:

airflow.exceptions.AirflowException: decoding with 'utf-16le' codec failed (AirflowException: Task received SIGTERM signal)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1705, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 716, in do_execute
    cursor.execute(statement, parameters)
SystemError: <class 'pyodbc.Error'> returned a result with an error set

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 896, in _rollback_impl
    self.engine.dialect.do_rollback(self.connection)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 666, in do_rollback
    dbapi_connection.rollback()
pyodbc.OperationalError: ('08S01', '[08S01] [Microsoft][ODBC Driver 18 for SQL Server]Communication link failure (0) (SQLEndTran)') 

Here is a code of my Task. There can be up to 10 connections at once:

def update_from_gladiator_ost(market_id):    
    query = "DELETE from [stage].[dbo].[rests_by_docs_temp] where market_id = %d" % market_id
    execute_query_dwh(query)
    engine = dwh_conn()
    connection = engine.raw_connection()
    abc = connection.cursor()
    # abc.execute("DELETE from [stage].[dbo].[sell_movement_temp]; DELETE from [stage].[dbo].[rests_by_docs_temp]")
    df_op = pd.read_sql(
            "SET NOCOUNT ON exec [dbo].[mp_report_finance_agent_enhanced_basis_transport_royalty_NC_ost_by_docs4] @pmarket_id = %d, @pstart_date = '%s', @pend_date = '%s', @pselect = '1'" % (
            market_id, z, w), gladiator_conn())
    df_op = df_op.fillna(value=0)
    for row_count in range(0, df_op.shape[0]):
        chunk = df_op.iloc[row_count:row_count + 1, :].values.tolist()
        tuple_of_tuples = tuple(tuple(x) for x in chunk)
        abc.executemany(
                "insert into stage.dbo.rests_by_docs_temp" + " ([date_start],[market_id],[good_id],[agent_id],[doc_id],[tstart_qty],[tstart_amt],[IMP],[doc_name]) values   (?,?,?,?,?,?,?,?,?)",
                tuple_of_tuples)
        abc.commit()
    connection.close()

As you see, I get data from database and INSERT it in my DWH

And here is my connections:

def dwh_conn():
    mySQL = '192.168.240.1'
    myDB = 'DWH'
    login = 'sa'
    PWD = '....'
    Encrypt = 'No'
    Certificate = 'Yes'

    params = urllib.parse.quote_plus("DRIVER={ODBC Driver 18 for SQL Server};"
                                     "SERVER=" + mySQL + ";"
                                                         "SERVER=" + mySQL + ";"
                                                                             "Port=1433" + ";"
                                                                                           "DATABASE=" + myDB + ";"
                                                                                                                "UID=" + login + ";"
                                                                                                                                 "PWD=" + PWD + ";"
                                                                                                                                                "Encrypt=" + Encrypt + ";"
                                                                                                                                                                       "TrustServerCertificate=" + Certificate + ";")
    engine = sa.create_engine('mssql+pyodbc:///?odbc_connect={}?charset=utf8mb4'.format(params), fast_executemany=True)
    return engine


def gladiator_conn():
    mySQL = '...'
    myDB = '...'
    login = '...'
    PWD = '...'
    Encrypt = 'No'
    Certificate = 'Yes'
    params = urllib.parse.quote_plus("DRIVER={ODBC Driver 18 for SQL Server};"
                                     "SERVER=" + mySQL + ";"
                                                         "Port=1433" + ";"
                                                                       "DATABASE=" + myDB + ";"
                                                                                            "UID=" + login + ";"
                                                                                                             "PWD=" + PWD + ";"
                                                                                                                            "Encrypt=" + Encrypt + ";"
                                                                                                                                                   "TrustServerCertificate=" + Certificate + ";")
    engine = sa.create_engine('mssql+pyodbc:///?odbc_connect={}?charset=utf8mb4'.format(params), fast_executemany=True)
    return engine

I think the problem is in unixODBC. Because when I do the whole code in Pycharm on Windows - everythong is fine.
But on docker Ubuntu/Airflow - it sometimes fails.
I can restart the task which failed and it can go fine but can fail again

updated:
I guess, I Found one solution but I cant realize it on my case.

def decode_sketchy_utf16(raw_bytes):
    s = raw_bytes.decode("utf-16le", "ignore")
    try:
        n = s.index('\u0000')
        s = s[:n]  # respect null terminator
    except ValueError:
        pass
    return s

# ...

prev_converter = cnxn.get_output_converter(pyodbc.SQL_WVARCHAR)
cnxn.add_output_converter(pyodbc.SQL_WVARCHAR, decode_sketchy_utf16)
col_info = crsr.columns("Clients").fetchall()
cnxn.add_output_converter(pyodbc.SQL_WVARCHAR, prev_converter)  # restore previous behaviour

Help me how to make it work in my code? Where should I implement it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

妳是的陽光 2025-02-20 19:25:23

找到答案。当我缺乏记忆(手术)时,这些问题就会上升。尤其是当几个容器打开时,可能会遇到此错误

Found an answer. These problem rises when I'm lack of memory (operative). Especially when several containers on, it might go to this error

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文