如何在循环中更新 SQL Alchemy 中的记录

发布于 2024-10-15 21:09:48 字数 2246 浏览 4 评论 0原文

我正在尝试使用 SQLSoup - SQLAlchemy 扩展来更新 SQL Server 2008 数据库中的记录。我使用 pyobdc 进行连接。有很多问题使得很难找到相关的例子。

我正在一个非常大的表(200万+记录)中重新投影几何字段,因此无法使用许多更新字段的标准方法。我需要从几何字段中提取坐标到文本,转换它们并将它们传回。这一切都很好,并且所有单独的部分都在工作。

但是我想在每一行上执行 SQL Update 语句,同时逐条循环记录。我假设这会锁定记录集,或者连接正在使用中 - 就像我使用下面的代码一样,它在成功更新第一条记录后挂起。

任何有关如何创建新连接、重用现有连接或以其他方式完成此操作的建议都将受到赞赏。

s = select([text("%s as fid" % id_field),
            text("%s.STAsText() as wkt" % geom_field)],
           from_obj=[feature_table])

rs = s.execute()

for row in rs:
    new_wkt = ReprojectFeature(row.wkt)

    update_value = "geometry :: STGeomFromText('%s',%s)" % (new_wkt, "3785")
    update_sql = ("update %s set GEOM3785 = %s where %s = %i" %
                  (full_name, update_value, id_field, row.fid))

    conn = db.connection()
    conn.execute(update_sql)
    conn.close() #or not - no effect..

更新后的工作代码现在看起来像这样。它在一些记录上工作正常,但在整个表上挂起,所以我猜它读取了太多数据。

db = SqlSoup(conn_string)
#create outer query

Session = sessionmaker(autoflush=False, bind=db.engine)
session = Session()
rs = session.execute(s)

for row in rs: 
    #create update sql...
    session.execute(update_sql)
session.commit()

我现在收到连接繁忙错误。

DBAPIError: (错误) ('HY000', '[HY000] [Microsoft][ODBC SQL Server Driver]连接正忙于处理另一个 hstmt (0) (SQLExecDirectW) 的结果')

看起来这可能是 ODBC 的问题驱动程序 - http://sourceitsoftware.blogspot.com /2008/06/connection-is-busy-with-results-for.html

进一步更新

在使用分析器的服务器上,它显示 select 语句,然后第一个更新语句是“开始”,但都不完整。 如果我将 Select 语句设置为返回前 10 行,则它会完成并运行更新。

SQL: Batch Starting   Select...
SQL: Batch Starting   Update...

我认为这是 pyodbc 和 SQL Server 驱动程序的问题。如果我删除 SQL Alchemy 并使用 pyodbc 执行相同的 SQL,它也会挂起。即使我为更新创建一个新的连接对象。

我还尝试了 SQL Server Native Client 10.0 驱动程序,该驱动程序旨在允许 MARS - 多个活动记录集但这没有什么区别。最后,我求助于“分页结果”并使用 pyodbc 和 SQL 更新这些批次(见下文),但我认为 SQLAlchemy 能够自动为我完成此操作。

I am trying to use SQLSoup - the SQLAlchemy extention, to update records in a SQL Server 2008 database. I am using pyobdc for the connections. There are a number of issues which make it hard to find a relevant example.

I am reprojection a geometry field in a very large table (2 million + records), so many of the standard ways of updating fields cannot be used. I need to extract coordinates from the geometry field to text, convert them and pass them back in. All this is fine, and all the individual pieces are working.

However I want to execute a SQL Update statement on each row, while looping through the records one by one. I assume this places locks on the recordset, or the connection is in use - as if I use the code below it hangs after successfully updating the first record.

Any advice on how to create a new connection, reuse the existing one, or accomplish this another way is appreciated.

s = select([text("%s as fid" % id_field),
            text("%s.STAsText() as wkt" % geom_field)],
           from_obj=[feature_table])

rs = s.execute()

for row in rs:
    new_wkt = ReprojectFeature(row.wkt)

    update_value = "geometry :: STGeomFromText('%s',%s)" % (new_wkt, "3785")
    update_sql = ("update %s set GEOM3785 = %s where %s = %i" %
                  (full_name, update_value, id_field, row.fid))

    conn = db.connection()
    conn.execute(update_sql)
    conn.close() #or not - no effect..

Updated working code now looks like this. It works fine on a few records, but hangs on the whole table, so I guess it is reading in too much data.

db = SqlSoup(conn_string)
#create outer query

Session = sessionmaker(autoflush=False, bind=db.engine)
session = Session()
rs = session.execute(s)

for row in rs: 
    #create update sql...
    session.execute(update_sql)
session.commit()

I now get connection busy errors.

DBAPIError: (Error) ('HY000', '[HY000] [Microsoft][ODBC SQL Server Driver]Connection is busy with results for another hstmt (0) (SQLExecDirectW)')

It looks like this could be a problem with the ODBC driver - http://sourceitsoftware.blogspot.com/2008/06/connection-is-busy-with-results-for.html

Further Update:

On the server using profiler, it shows the select statement then the first update statement are "starting" but neither complete.
If I set the Select statement to return the top 10 rows, then it does complete and the updates run.

SQL: Batch Starting   Select...
SQL: Batch Starting   Update...

I believe this is an issue with pyodbc and SQL Server drivers. If I remove SQL Alchemy and execute the same SQL with pyodbc it also hangs. Even if I create a new connection object for the updates.

I also tried the SQL Server Native Client 10.0 driver which is meant to allow MARS - Multiple Active Record Sets but it made no difference. In the end I have resorted to "paging the results" and updating these batches using pyodbc and SQL (see below), however I thought SQLAlchemy would have been able to do this for me automatically.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

や莫失莫忘 2024-10-22 21:09:51

在找到另一个解决方案之前,我使用单个连接和自定义 SQL 来返回记录集,并批量更新它们。我不认为我正在做的事情是一个特别独特的情况,所以我不确定为什么我不能同时处理多个结果集。

下面可以工作,但是非常非常慢..

cnxn = pyodbc.connect(conn_string, autocommit=True)
cursor = cnxn.cursor()

#get total recs in the database
s = "select count(fid) as count from table"
count = cursor.execute(s).fetchone().count

#choose number of records to update in each iteration
batch_size = 100
for i in range(1,count, batch_size):
    #sql to bring back relevant records in each batch
    s = """SELECT fid, wkt from(select ROW_NUMBER() OVER(ORDER BY FID ASC) AS 'RowNumber'
,FID
,GEOM29902.STAsText() as wkt
    FROM %s) features
    where RowNumber >= %i and RowNumber <= %i""" % (full_name,i,i+batch_size)

    rs = cursor.execute(s).fetchall()
    for row in rs:
        new_wkt = ReprojectFeature(row.wkt)
        #...create update sql statement for the record
        cursor.execute(update_sql)
        counter += 1
cursor.close()
cnxn.close()  

Until I find another solution I am using a single connection and custom SQL to return sets of records, and updating these in batches. I don't think what I am doing is a particulary unique case, so I am not sure why I cannot handle multiple result sets simultaneously.

Below works but is very, very slow..

cnxn = pyodbc.connect(conn_string, autocommit=True)
cursor = cnxn.cursor()

#get total recs in the database
s = "select count(fid) as count from table"
count = cursor.execute(s).fetchone().count

#choose number of records to update in each iteration
batch_size = 100
for i in range(1,count, batch_size):
    #sql to bring back relevant records in each batch
    s = """SELECT fid, wkt from(select ROW_NUMBER() OVER(ORDER BY FID ASC) AS 'RowNumber'
,FID
,GEOM29902.STAsText() as wkt
    FROM %s) features
    where RowNumber >= %i and RowNumber <= %i""" % (full_name,i,i+batch_size)

    rs = cursor.execute(s).fetchall()
    for row in rs:
        new_wkt = ReprojectFeature(row.wkt)
        #...create update sql statement for the record
        cursor.execute(update_sql)
        counter += 1
cursor.close()
cnxn.close()  
痴者 2024-10-22 21:09:50

尝试使用会话

rs = s.execute() 然后变为 session.execute(rs) 并且您可以用 session.execute(update_sql) 替换最后三行>。我还建议将您的会话配置为关闭自动提交,并在最后调用 session.commit()

Try using a Session.

rs = s.execute() then becomes session.execute(rs) and you can replace the last three lines with session.execute(update_sql). I'd also suggest configuring your Session with autocommit off and call session.commit() at the end.

萌面超妹 2024-10-22 21:09:50

我可以建议,当您的进程挂起时,您在 Sql 框上执行 sp_who2 并查看发生了什么。检查是否有被阻止的 spid,并查看是否可以在 Sql 代码中找到任何可以表明正在发生的情况的内容。如果您确实发现某个 spid 阻塞了其他 spid,您可以执行 dbcc inputbuffer(*spidid*) 并查看它是否告诉您执行的查询是什么。否则,您还可以附加 Sql 分析器并跟踪您的调用。

在某些情况下,Sql 服务器上的并行性也可能导致阻塞。除非这是一个数据仓库,否则我建议关闭 Max DOP(将其设置为 1)。请告诉我,当我早上再次检查并且您需要帮助时,我会很乐意提供帮助。

Can I suggest that when your process hangs you do a sp_who2 on the Sql box and see what is happening. Check for blocked spid's and see if you can find anything in the Sql code that can suggest what is happening. If you do find a spid that is blocking others you can do a dbcc inputbuffer(*spidid*) and see if that tells you what the query was it executed. Otherwise you can also attach the Sql profiler and trace your calls.

In some cases it could also be parallelism on the Sql server that cause blocks. Unless this is a data warehouse, I suggest turn your Max DOP off, (set it to 1). Let me know and when I check this again in the morning and you need help, I'll be glad to help.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文