如何在循环中更新 SQL Alchemy 中的记录

发布于 2024-10-15 21:09:48 字数 2246 浏览 4 评论 0原文

我正在尝试使用 SQLSoup - SQLAlchemy 扩展来更新 SQL Server 2008 数据库中的记录。我使用 pyobdc 进行连接。有很多问题使得很难找到相关的例子。

我正在一个非常大的表（200万+记录）中重新投影几何字段，因此无法使用许多更新字段的标准方法。我需要从几何字段中提取坐标到文本，转换它们并将它们传回。这一切都很好，并且所有单独的部分都在工作。

但是我想在每一行上执行 SQL Update 语句，同时逐条循环记录。我假设这会锁定记录集，或者连接正在使用中 - 就像我使用下面的代码一样，它在成功更新第一条记录后挂起。

任何有关如何创建新连接、重用现有连接或以其他方式完成此操作的建议都将受到赞赏。

s = select([text("%s as fid" % id_field),
            text("%s.STAsText() as wkt" % geom_field)],
           from_obj=[feature_table])

rs = s.execute()

for row in rs:
    new_wkt = ReprojectFeature(row.wkt)

    update_value = "geometry :: STGeomFromText('%s',%s)" % (new_wkt, "3785")
    update_sql = ("update %s set GEOM3785 = %s where %s = %i" %
                  (full_name, update_value, id_field, row.fid))

    conn = db.connection()
    conn.execute(update_sql)
    conn.close() #or not - no effect..

更新后的工作代码现在看起来像这样。它在一些记录上工作正常，但在整个表上挂起，所以我猜它读取了太多数据。

db = SqlSoup(conn_string)
#create outer query

Session = sessionmaker(autoflush=False, bind=db.engine)
session = Session()
rs = session.execute(s)

for row in rs: 
    #create update sql...
    session.execute(update_sql)
session.commit()

我现在收到连接繁忙错误。

DBAPIError: (错误) ('HY000', '[HY000] [Microsoft][ODBC SQL Server Driver]连接正忙于处理另一个 hstmt (0) (SQLExecDirectW) 的结果')

看起来这可能是 ODBC 的问题驱动程序 - http://sourceitsoftware.blogspot.com /2008/06/connection-is-busy-with-results-for.html

进一步更新：

在使用分析器的服务器上，它显示 select 语句，然后第一个更新语句是“开始”，但都不完整。如果我将 Select 语句设置为返回前 10 行，则它会完成并运行更新。

SQL: Batch Starting   Select...
SQL: Batch Starting   Update...

我认为这是 pyodbc 和 SQL Server 驱动程序的问题。如果我删除 SQL Alchemy 并使用 pyodbc 执行相同的 SQL，它也会挂起。即使我为更新创建一个新的连接对象。

我还尝试了 SQL Server Native Client 10.0 驱动程序，该驱动程序旨在允许 MARS - 多个活动记录集但这没有什么区别。最后，我求助于“分页结果”并使用 pyodbc 和 SQL 更新这些批次（见下文），但我认为 SQLAlchemy 能够自动为我完成此操作。

原文

I am trying to use SQLSoup - the SQLAlchemy extention, to update records in a SQL Server 2008 database. I am using pyobdc for the connections. There are a number of issues which make it hard to find a relevant example.

I am reprojection a geometry field in a very large table (2 million + records), so many of the standard ways of updating fields cannot be used. I need to extract coordinates from the geometry field to text, convert them and pass them back in. All this is fine, and all the individual pieces are working.

However I want to execute a SQL Update statement on each row, while looping through the records one by one. I assume this places locks on the recordset, or the connection is in use - as if I use the code below it hangs after successfully updating the first record.

Any advice on how to create a new connection, reuse the existing one, or accomplish this another way is appreciated.

s = select([text("%s as fid" % id_field),
            text("%s.STAsText() as wkt" % geom_field)],
           from_obj=[feature_table])

rs = s.execute()

for row in rs:
    new_wkt = ReprojectFeature(row.wkt)

    update_value = "geometry :: STGeomFromText('%s',%s)" % (new_wkt, "3785")
    update_sql = ("update %s set GEOM3785 = %s where %s = %i" %
                  (full_name, update_value, id_field, row.fid))

    conn = db.connection()
    conn.execute(update_sql)
    conn.close() #or not - no effect..

Updated working code now looks like this. It works fine on a few records, but hangs on the whole table, so I guess it is reading in too much data.

db = SqlSoup(conn_string)
#create outer query

Session = sessionmaker(autoflush=False, bind=db.engine)
session = Session()
rs = session.execute(s)

for row in rs: 
    #create update sql...
    session.execute(update_sql)
session.commit()

I now get connection busy errors.

DBAPIError: (Error) ('HY000', '[HY000] [Microsoft][ODBC SQL Server Driver]Connection is busy with results for another hstmt (0) (SQLExecDirectW)')

It looks like this could be a problem with the ODBC driver - http://sourceitsoftware.blogspot.com/2008/06/connection-is-busy-with-results-for.html

Further Update:

On the server using profiler, it shows the select statement then the first update statement are "starting" but neither complete.
If I set the Select statement to return the top 10 rows, then it does complete and the updates run.

SQL: Batch Starting   Select...
SQL: Batch Starting   Update...

I believe this is an issue with pyodbc and SQL Server drivers. If I remove SQL Alchemy and execute the same SQL with pyodbc it also hangs. Even if I create a new connection object for the updates.

I also tried the SQL Server Native Client 10.0 driver which is meant to allow MARS - Multiple Active Record Sets but it made no difference. In the end I have resorted to "paging the results" and updating these batches using pyodbc and SQL (see below), however I thought SQLAlchemy would have been able to do this for me automatically.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

や莫失莫忘 2024-10-22 21:09:51

在找到另一个解决方案之前，我使用单个连接和自定义 SQL 来返回记录集，并批量更新它们。我不认为我正在做的事情是一个特别独特的情况，所以我不确定为什么我不能同时处理多个结果集。

下面可以工作，但是非常非常慢..

cnxn = pyodbc.connect(conn_string, autocommit=True)
cursor = cnxn.cursor()

#get total recs in the database
s = "select count(fid) as count from table"
count = cursor.execute(s).fetchone().count

#choose number of records to update in each iteration
batch_size = 100
for i in range(1,count, batch_size):
    #sql to bring back relevant records in each batch
    s = """SELECT fid, wkt from(select ROW_NUMBER() OVER(ORDER BY FID ASC) AS 'RowNumber'
,FID
,GEOM29902.STAsText() as wkt
    FROM %s) features
    where RowNumber >= %i and RowNumber <= %i""" % (full_name,i,i+batch_size)

    rs = cursor.execute(s).fetchall()
    for row in rs:
        new_wkt = ReprojectFeature(row.wkt)
        #...create update sql statement for the record
        cursor.execute(update_sql)
        counter += 1
cursor.close()
cnxn.close()

Until I find another solution I am using a single connection and custom SQL to return sets of records, and updating these in batches. I don't think what I am doing is a particulary unique case, so I am not sure why I cannot handle multiple result sets simultaneously.

Below works but is very, very slow..

cnxn = pyodbc.connect(conn_string, autocommit=True)
cursor = cnxn.cursor()

#get total recs in the database
s = "select count(fid) as count from table"
count = cursor.execute(s).fetchone().count

#choose number of records to update in each iteration
batch_size = 100
for i in range(1,count, batch_size):
    #sql to bring back relevant records in each batch
    s = """SELECT fid, wkt from(select ROW_NUMBER() OVER(ORDER BY FID ASC) AS 'RowNumber'
,FID
,GEOM29902.STAsText() as wkt
    FROM %s) features
    where RowNumber >= %i and RowNumber <= %i""" % (full_name,i,i+batch_size)

    rs = cursor.execute(s).fetchall()
    for row in rs:
        new_wkt = ReprojectFeature(row.wkt)
        #...create update sql statement for the record
        cursor.execute(update_sql)
        counter += 1
cursor.close()
cnxn.close()

回复收藏 0 原文

痴者 2024-10-22 21:09:50

尝试使用会话。

rs = s.execute() 然后变为 session.execute(rs) 并且您可以用 session.execute(update_sql) 替换最后三行>。我还建议将您的会话配置为关闭自动提交，并在最后调用 session.commit() 。

回复收藏 0 原文

萌面超妹 2024-10-22 21:09:50

我可以建议，当您的进程挂起时，您在 Sql 框上执行 sp_who2 并查看发生了什么。检查是否有被阻止的 spid，并查看是否可以在 Sql 代码中找到任何可以表明正在发生的情况的内容。如果您确实发现某个 spid 阻塞了其他 spid，您可以执行 dbcc inputbuffer(*spidid*) 并查看它是否告诉您执行的查询是什么。否则，您还可以附加 Sql 分析器并跟踪您的调用。

在某些情况下，Sql 服务器上的并行性也可能导致阻塞。除非这是一个数据仓库，否则我建议关闭 Max DOP（将其设置为 1）。请告诉我，当我早上再次检查并且您需要帮助时，我会很乐意提供帮助。

回复收藏 0 原文

~没有更多了~